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1. My name is Dr. William Jack, Research Director for the DNA Enzymes 
Division at New England Biolabs Inc. My resume is attached. 

2. 1 have been studying the structure and function of DNA polymerases for 
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polymerases in US Patent 5,500,363. In this patent, the United States 
Patent and Trademark Office recognized the validity of our claim to a class of 
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ability to hybridize under defined conditions to various specified DNA 
sequences. The group was exemplified by T.litoralis (Vent), GBD (Deep 
Vent), and 9°N DNA Polymerases. 

4. We also found that this group of polymerases had a high degree of amino 
acid sequence identity. A comparative three-dimensional alignment of 
members of this group of enzymes showed a high degree of structural 
conservation, consistent with the observed high degree of primary amino 
acid sequence identity/similarity. See for example, Vent (Rodriguez, et al., 
2000), Tgo (Hopfner, et al., 1999), D. Tok (Zhao, et al., 1999), and KOD 
(Hashimoto, et al., 2001) DNA Polymerases. 

5. The structural equivalence of this group of polymerases is further 
supported by experiments reported in Example 10 of the above application 
in which we show that mutation of an analogous residue in Vent and 9°N 
DNA Polymerases yields enzymes with equivalent acyclonucleotide 
incorporation efficiencies. 

6. We discovered that this group of enzymes is capable of efficiently utilizing 
acyclonucleotides as substrates. We demonstrated this property using four 
examples of polymerases within this tightly defined group. Any molecular 
biologist of ordinary skill in the art would expect from these findings that this 
property would occur in all members of the enzyme group defined above. 

7. Additionally, my colleagues and I have published articles in peer reviewed 
journals discussing the physical basis for the preferential incorporation of 
acyclonucleotides, and also for the enhanced incorporation with Vent A488L 
and 9°N A485L DNA Polymerase mutants. See Gardner, et al. (2004) on 
page 11841, column 1, paragraph 2 and page 11841, column 2, paragraph 
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8. 1 assert that the combination of the high degree of homogeneity in DNA 
and amino acid sequences of archaeon DNA polymerases, plus the structural 
evidence that modification of specific amino acids alters enzyme specificity, 
would be sufficient to assure a person of ordinary skill in the art that the 
class of polymerases as defined above will interact with acyclonucleotide 
substrates as shown in the above application. 

9. To further support the above statements, we have conducted additional 
experiments to confirm that archeon Family B polymerases with an amino 
acid sequence identity of greater than 30% can utilize acyclonucleotides as a 
substrate. This data is attached to the present declaration as appendix 1. 

9. I further declare under penalty of perjury pursuant to laws of the United 
States of America that the foregoing is true and correct and that the 
Declaration was executed by me on: 
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The crystal structure of family B DNA polymerase from the hyperther- 
mophilic archaeon Pyrococcus kodakaraensis KOD1 (KOD DNA polymer- 
ase) was determined. KOD DNA polymerase exhibits the highest known 
extension rate, processiviry and fidelity. We carried out the structural 
analysis of KOD DNA polymerase in order to clarify the mechanisms of 
those enzymatic features. Structural comparison of DNA polymerases 
from hyperthermophilic archaea highlighted the conformational differ- 
ence in Thumb domains. The Thumb domain of KOD DNA polymerase 
shows an "opened" conformation. The fingers subdomain possessed 
many basic residues at the side of the polymerase active site. The resi- 
dues are considered to be accessible to the incoming dNTP by electro- 
static interaction. A fi-hairpin motif (residues 242-249) extends from the 
Exonuclease (Exo) domain as seen in the editing complex of the RB69 
DNA polymerase from bacteriophage RB69. Many arginine residues are 
located at the forked-point (the junction of the template-binding and edit- 
ing clefts) of KOD DNA polymerase, suggesting that the basic environ- 
ment is suitable for partitioning of the primer and template DNA duplex 
and for stabilizing the partially melted DNA structure in the high-tem- 
perature environments. The stabilization of the melted DNA structure at 
the forked-point may be correlated with the high PCR performance of 
KOD DNA polymerase, which is due to low error rate, nigh elongation 
rate and processivity. 

© 2001 Academic Press 

Keywords: archaea; crystal structure; family B DNA polymerase; "forked- 
point"; KOD DNA polymerase 



Introduction 

DNA polymerases are a group of enzymes that 
use single-stranded DNA as a template for the syn- 
thesis of the complementary DNA strand. These 
enzymes are multifunction, with both synthetic 
(polymerase) and one or two degradative modes 
(5'-3' and/or 3'-5' exonucleases) and play an essen- 
tial role in nucleic acid metabolism including the 
processes of DNA replication, repair and recombi- 
nation. Many DNA polymerase genes have been 
cloned and sequenced. Amino acid sequences 
deduced from their nucleotide sequences can be 
classified into four major types: Escherichia coli 
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DNA polymerase I (family A), E. coli DNA poly- 
merase II (family B), E. coli DNA polymerase III 
(family C) and others (family X). 1 Recently, a new 
family of DNA polymerases has been identified; all 
members of this family contain five highly con- 
served motifs, I-V, and several of these poly- 
merases participate in lesion bypass. 2 This family is 
called the UmuC/DinB family. 3 Family B DNA 
polymerases include eukaryotic DNA polymerase 
a, 5, and e, which are thought to be components of 
the replisome and to carry out chromosomal DNA 
replication. Archaeal proteins involved in gene 
expression, such as those for DNA replication, 
transcription, and translation, have been found to 
be similar to those from eucarya. Therefore, the 
archaeal system of gene expression is a simplified 
model of the eukaryotic system. In contrast, the 
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cellular appearance and organization of archaea 
are more like those of bacteria. 

The first crystal structure of a family B DNA 
polymerase to be obtained was that of bacterio- 
phage RB69 DNA polymerase (RB69 DNA poly- 
merase). 4 The first crystal structure of archaeal 
DNA polymerase was DNA polymerase from Ther- 
mococcus gorgonarius (Tgo DNA polymerase). 5 The 
editing complex of RB69 DNA polymerase has 
been reported, 6 two further crystal structures of 
archaeal family B DNA polymerases have recently 
been reported: Tok DNA polymerase from Desul- 
furococcus sp. Tok 7 is 9°N-7 DNA polymerase from 
Thermococcus sp. 9°N-7. 8 

The Pyrococcus kodakaraensis KOD1 is a 
hyperthermophilic archaeon, with an optimum 
growth temperature of 95 °C. 9 Enzymes produced 
in KOD1 were reported to be extremely thermo- 
stable and to have eukaryotic characteristics. 9 The 
optimum temperature of KOD DNA polymerase is 
75 °C similar to that of DNA polymerase obtained 
from Pyrococcus furiosus (Pfu DNA polymerase). 
KOD DNA polymerase, however, exhibits the 
higher extension rate (100-130 nucleotides/second) 
and processivity (>300 bases); five times and ten to 
15 times higher than those of Pfu DNA polymer- 
ase, respectively. 10 Thermostable DNA poly- 
merases are expected to be suitable enzymes for 
Polymerase Chain Reaction (PCR) KOD DNA 
polymerase is, therefore, suitable for DNA amplifi- 
cation by such means. Indeed, KOD DNA poly- 
merase is widely used in rapid and accurate PCR 
systems (TOYOBO Ltd., Japan). 

Although structures of three archaeal DNA poly- 
merases have been determined as described above, 
no structural information relating to elongation 
rate, processivity or fidelity is provided. We car- 
ried out the structural analysis of KOD DNA poly- 
merase in order to clarify the mechanism of 
enzymatic features of KOD DNA polymerase, 
which are the highest extension rate, processivity 
and fidelity. Here, we report the crystal structure 
of DNA polymerase from the hyperthermophilic 
archaeon Pyrococcus kodakaraensis KOD1. The three- 
dimensional structure of this KOD DNA polymer- 
ase may provide useful information to clarify the 
mechanisms for rapid and accurate reaction. In 
addition, this information may contribute to the 
improvement of the PCR properties of enzymes 
already in use such as thermostability, error rate, 
elongation rate and processivity, or for designing 
new enzymes for PCR as well as DNA replication 
by family B DNA polymerases. 

Results and Discussion 
Overall structure 

KOD DNA polymerase has a disk-like shape 
with dimensions 60 A x 80 A x 100 A and is 
made up of distinct domains and subdomains: 
N-terminal (N-ter: 1-130, 327-368, violet), Exo- 
nuclease (Exo: 131-326, blue), Polymerase (Pol) 



domain including the Palm and Fingers subdo- 
mains (369^49, 500-587, brown; and 450-499, 
green, respectively) and the Thumb domain includ- 
ing thumb-1 and thumb-2 subdomains (588-774, 
red) (Figure 1(a)). The polymerase active site, con- 
taining three conserved carboxylates, (Asp404, 
Asp540 and Asp542) is located in an anti-parallel 
P-sheet in the Palm subdomain. The exonuclease 
active site contains two conserved carboxylates 
(Aspl41 and Glul43) and is located in an anti-par- 
allel p-sheet in the Exo domain. The Polymerase 
and exonuclease active sites on the molecular sur- 
face are indicated by P and E, respectively (see 
Figure 4). Structural comparisons of archaeal DNA 
polymerases (KOD, Tgo and 9°N-7 DNA poly- 
merases) are shown in Figure 1(b). The structural 
architectures of the proteins are identical, but the 
orientation of the domains and subdomains is 
different. In the case of the KOD DNA polymerase 
(red), the Thumb domain is shifted to make an 
"open" conformation and the portion of the Palm 
domain neighboring the root of the Thumb domain 
is slightly shifted as a result of the large movement 
of the Thumb domain in comparison to other 
archaeal DNA polymerases. Table 1 shows the 
averaged temperature factors of the domains and 
subdomains in the crystal structure of KOD DNA 
polymerase. The value of the Thumb domain was 
markedly higher than the others. The structures of 
many residues in the Thumb-2 subdomain are not 
defined, because the orientation of the subdomain 
is highly disordered. Therefore, it is thought that 
the structure of KOD DNA polymerase described 
here provides information for the DNA-free, most 
relaxed conformation. The structure of the editing 
complex of RB69 DNA polymerase revealed that 
newly synthesized duplex DNA is grasped by the 
Pol and Thumb domains. Although the orientation 
of the Thumb domain is potentially highly flexible, 
the orientation may be fixed when it binds to the 
primer-template duplex. 

Polymerase domain 

The Pol domain is made up of the Fingers and 
Palm subdomains and has an "L-like" shape 
(Figure 2(a)). The polymerization mechanism has 
been studied mainly on family A DNA poly- 
merases (Pol-I). A structural basis for a metal- 



Table 1. Averaged temperature factors 



Domain 


Temperature factor (A 2 ) 


N-ter 


38.1 


Exo 


55.7 


Pol 




Fingers 


495 


Palm 


52.8 


Thumb 


93.7 


Overall 


55.9 
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Thumb-1 



N-ter 




Thumb-1 



Figure 1. (a) Overall structure of 
KOD DNA polymerase. The struc- 
ture is composed of domains and 
subdomains, which are N-terminal 
(N-ter, violet), Exonuclease (Exo, 
blue), Polymerase (Pol) domain 
including the Palm (brown) and 
Fingers (green) subdomains and 
the Thumb domain (red), including 
the Thumb-1 and Thumb-2 subdo- 
mains. Conserved carboxylate 
residues in Polymerase and Exonu- 
clease active site are shown by ball- 
and-stick models, (b) Confor- 
mational comparison of Thumb 
domains among three archaeal 
DNA polymerases. Red, KOD 
DNA polymerase; blue, Tgo DNA 
polymerase; and green, 9°N-7 DNA 
polymerase. The comparison shows 
that the Thumb domain of KOD 
DNA polymerase displays the most 
"opened" conformation. 



assisted mechanism of phosphoryl transfer was 
provided by the bacteriophage T7 DNA replication 
complex. 11 The complex structure shows that two 
metal ions are bound by strictly conserved carbox- 
ylates (Asp475 and Asp654, which correspond to 
Asp404 and Asp542 in KOD DNA polymerase) 



extended from the anti-parallel p-sheet of the Palm 
domain. The phosphate group of mcoming ddGTP 
is held by the metal ions and the four basic resi- 
dues extending from the Fingers subdomain 
(His506, Arg518 and Lys522). The crystal structure 
of two ternary complexes of the large fragment of 
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Fingers 



Thumb-1 



C428-C44 




L R247 ^ R266 

"Forked point" 
R243 «> 

ft R381 



Figure 2. (a) Ribbon represen- 
tation of the Pol domain. The 
domain is made up of Fingers and 
Palm subdomains. Conserved 
carboxylate residues (D404, D540 
and D542) are represented by ball- 
and-stick models. Basic residues 
are represented by ball-and-stick 
models, which stand in a line in 
the Fingers subdomain, facing the 
polymerase active site. K464 are 
replaced by alanine, because of the 
ambiguity of its electron density. 
Two disulfide bonds are displayed 
(C428-C442 and C506-C509). Aro- 
matic residues adjacent to a glycine 
residue, represented by ball-and- 
stick models, are localized in the 
joints of the subdomains. The 
Thumb domain is represented by 
the semitransparent model. The C 1 
atoms of the nucleophilic residues, 
S407 (Pko Pol-1), S492 (77i Pol-1) 
and T541 (Hi Pol-2), are rep- 
resented by violet spheres, (b) Exo- 
nudease domains of KOD DNA 
polymerase and RB69 DNA poly- 
merase (semitransparent model). 
Conserved carboxylate (D141 and 
El 43) and arginine residues (R243, 
R247, R265, R266, R343, R381 and 
R501) in the forked-point of KOD 
DNA polymerase are represented 
by ball-and-stick models. The red 
strands are p-hairpin motif parti- 
tioning template-binding and edit- 
ing clefts. The loop containing 
Phel52 is shown in orange. F123 of 
RB69 DNA polymerase and F152 of 
KOD DNA polymerase are rep- 
resented by semitransparent and 
opaque ball-and-stick models, 
respectively. 
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Thermits aquaticus DNA polymerase I (Klentaql) 
with a primer-template DNA and ddCTP have 
been reported. 12 The ternary complexes suggest 
that basic residues of the Fingers subdomain hold 
the phosphate group of the incoming dNTP and 
the domain induces a conformational change to 
deliver the incoming nucleotide to the active site. 
In the case of family B DNA polymerases, the Fin- 
gers subdomain is composed mainly of two long 
helices and does not have a joint that appeared in 
the structures of family A DNA polymerases. 
Therefore, it seems that in the case of archaeal 
DNA polymerases, the movement of the Pol 
domain to deliver dNTP to the active site differs 
from that of family A DNA polymerases. Kinetic 
study of RB69 DNA polymerase mutants revealed 
that four residues (Arg482, Lys486, Lys560 and 
Asn564) of the Fingers subdomain affected dNTP 
incorporation. 13 The residues are conserved in 
family B DNA polymerases, and correspond to 
Arg460, Lys464, Lys487 and Asn491 in KOD DNA 
polymerase, respectively. Furthermore, Lys468, 
Arg476, Lys477 and Arg484 are located at the tip 
of the Fingers subdomain on the side of the poly- 
merase active site in KOD DNA polymerase 
(Figure 2(a)). It is expected that the "queue" of 
basic residues captures the incoming dNTPs, then 
the dNTP is delivered toward the polymerase 
active-site center by accompanying the movement 
of the polymerase domain. Two disulfide bonds 
exist in the connection site between the Palm and 
Fingers subdomains (Figure 2(a); Cys428-Cys442 
and Cys506-Cys509). The two disulfide bonds are 
found also in the crystal structures of Tgo, Tok and 
9°N-7 DNA polymerases. Sequence alignment for 
archaeal DNA polymerases is shown in Figure 3, 
suggesting the potential for the formation of disul- 
fide bonds in the same sites. It is thought that the 
disulfide bonds are required to maintain the struc- 
ture of the Fingers and Palm subdomains at extre- 
mely high temperatures. Sequence comparison 
suggests that the number of disulfide bonds are 
correlated with optimum growth temperatures of 
organisms. DNA polymerases from Thermococcus 
litoralis, Methanococcus jannaschii and Archaeoglobus 
fulgidus, with optimum growth temperatures of 85, 
85 and 83 °C, respectively, are expected to have 
one disulfide bond, because Cys506 is replaced by 
serine in T. litoralis and M. Jannaschii, and Cys442 
is replaced by asinine in A. fulgidus. DNA poly- 
merase from Methanobacterium thermoautotrophicum, 
with an optimum growth temperature of 65 °C, is 
expected to have no disulfide bond, because 
Cys428, Cys442 and Cys506 are replaced by gluta- 
mic acid, arginine and serine, respectively. 

Archaeal DNA polymerases have characteristic 
sequences of aromatic residues adjacent to glycine 
residues (Figure 3). These are localized at the 
hinges of the Palm subdomain at the connections 
to the Fingers and Thumb-1 subdomains 
(Figure 2(a)). These aromatic residues may provide 
a flexible aromatic environment because of the 
adjoining glycine residues. This may contribute 



to the conformational changes of Pol domain in 
polymerization. 

The 3-5' exonuclease domain 

DNA is synthesized by competition between the 
rate of polymerase and exonuclease activities at the 
newly synthesized 3' terminus from the primer. 
Misincorporation of a nucleotide destabilizes the 
structure of duplex DNA at the 3' terminus of the 
primer. This decreases the rate of nucleophilic 
attack on the a-phosphate group of the incoming 
dNTP by the primer 3'-OH and allows excision of 
the incorrect nucleotide by the proofreading exonu- 
clease. The excision requires the movement of the 
3' terminus to the exonuclease active site 
accompanied by rewinding of the duplex DNA, 
because the exonuclease active site is set apart 
from the polymerase active site. In KOD DNA 
polymerase, the exonuclease active site is set apart 
from the polymerase active site by approximately 
40 A. The editing complex of RB69 DNA polymer- 
ase shows structural similarity to the editing mode 
of family B DNA polymerase. 6 The DNA polymer- 
ase binds the mismatched primer-template DNA, 
which is partially denatured; the 3' end of the pri- 
mer strand is bound at the exonuclease site. Resi- 
dues 251-262 of RB69 DNA polymerase, that form 
an extended p-hairpin structure that juts directly 
out from the protein surface and projects into the 
DNA, stabilize the partially denatured or melted 
structure. Arg260 extending from the P-hairpin 
motif plays an important role. Arg260 and Phel23 
appear to block the template strand by making 
interactions with the penultimate base at the 3' end 
of the primer-template. Arg260 and Phel23 in 
RB69 DNA polymerase correspond to Arg247 and 
Phel52, in KOD DNA polymerase respectively. 
Figure 2(b) shows the structural comparison of Exo 
domains of KOD and RB69 DNA polymerases. 
Molecular surface and electrostatic potentials are 
shown in Figure 4. The P-hairpin motif in KOD 
DNA polymerase corresponds to residues 242-249 
and Arg247, extending to the forked-point, which 
is the junction of the template-binding and editing 
clefts CT-cleft and E-cleft, respectively) (Figure 4). It 
seems that Arg247 can separate template strand 
from primer strand and stabilize the melted struc- 
tures of the strands in a manner similar to that of 
the RB69 DNA polymerase. As Phel52 is set apart 
from the active site, it is apparently unable to 
make an aromatic interaction with the base of the 
primer. Based on the above idea, the movement of 
the loop including Phel52 (Figure 2(b)) is required 
to interact with the primer bound at the E-cleft. 
Furthermore, Arg243 extends from the P-hairpin 
structure to the T-cleft Arg243 interacts with the 
template strand to fix it at the T-cleft. In addition 
to Arg243 and Arg247, five arginine residues gath- 
er at the forked-point in KOD DNA polymerase 
(Arg265, Arg266, Arg346, Arg381 and ArgSOl) and 
provide a basic environment (Figures 2(b) and 4). 
It seems that they can interact with the phosphate 
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Figure 3. Sequence alignment of archaeal DNA polymerases. The abbreviations used as follows: Pko, Pyrococcus 
kodakaraensis; Pfu, Pyrococcus furiosus; Tli, Thermococcus litoralis; Mja, Methanococcus jannaschii; Afu, Archaeoglobus fulgi- 
dus; and Mth, Methanobacterium thermoautotrophicum. Homologous residues are masked in gray. Remarkable residues 
are highlighted in reverse type. Conserved carboxylate residues in the Exonuclease and Polymerase active sites are 
shown in red. Basic residues gathering in the forked-point and Fingers subdomain are shown in blue. R243, R247, 
R255, R266, R346, R381 and R501 are located in the forked-point. R460, K464, K468, R476, K477, R484 and K487 are 
located in the Fingers subdomain and face into the polymerase active site. Cysteine residues forming (or possibly 
forming) disulfide bonds are shown in green, Nucleophilic residues in self-splicing reaction are shown in violet. 
Inteins intervene before the nucleophilic residues. Aromatic residues adjacent to glycines are shown in orange. 



groups of the DNA strand and stabilize the melted 
structure of DNA strands at the forked-point. 
Several arginine residues at the forked-point are 
conserved in known family B DNA polymerases 
from hyperthermophilic archaea. 

In DNA synthesis, the structure of DNA is vari- 
able at the stage of switching between the 
elongation and editing modes. Hypethermophiles 
must have mechanisms to protect their genomic 
DNA against thermal denaturation. The genomic 
DNA of hyperthermophilic archaea have nucleo- 
some-like structures brought about by interaction 
with histone-like proteins. 14 Nevertheless, at the 
replication fork, the DNA strands are exposed. 
Therefore, DNA polymerases of hyperthermophilic 
archaea are required to stabilize the exposed or 
melted DNA structure in the high temperature 



environment. The stabilization by DNA polymerase 
may correlate with the enzymatic characteristics of 
DNA polymerase such as half-life period of 
activity, error rate, elongation rate and proces- 
sivity. As discussed above, it is considered that 
the arginine residues around the "forked-point" 
have a remarkable effect on the stability of DNA 
structure. In the forked-point of Pfu DNA polymer- 
ase, Arg247, Arg265 and Arg501 are replaced by 
methionine, threonine and lysine, respectively. 
Therefore, the replacements may affect the 
difference of the enzymatic characteristics between 
KOD and Pfu DNA polymerases. Additional exper- 
iments such as site-directed mutagenesis, a 
together with enzymatic studies of DNA 
polymerases are necessary to clarify the role of the 
residues at the forked-point. 
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Figure 4. Molecular surface with electrostatic potential 
map around the forked-point. The red and blue surfaces 
are acidic and basic regions, respectively. Domains and 
subdomains are labeled with orange letters. Polymerase 
and Exonuclease active sites are labeled with P and E, 
respectively. The p-hairpin is labeled with p. 



Extein connection site 

The KOD DNA polymerase gene encodes a 1671 
amino acid residues precursor protein. The precur- 
sor protein is processed precisely into three parts 
by protein splicing. The self-splicing reaction yields 
the mature KOD polymerase (774 residues and 
two intervening protein domains (termed inteins), 
Pl-Ptol (360 residues) and PI-PMI (537 residues) 
as a result of the ligation of the external N and 
C-terminal domains (termed extein). 10 ' 15 All known 
precursor proteins contain conserved amino acids 
at self-splicing sites: serine, threonine or cysteine 
(nucleophiles) at the intein N terminus, and 
His-Asn pair at the intein C terminus followed by 
serine, threonine, cysteine (nucleophiles) at the 
C-extein N-terminus. 16 The traces of the protein 
splicing reaction in KOD DNA polymerase are 
Ser407 and Ser492, which were located in at the N 
terminus of the C-extein. In the crystal structure of 
KOD DNA polymerase, the nucleophilic residues 
are found in the Pol domain (Figure 2(a)). 

Self-splicing sites in archaeal family B DNA 
polymerases (a family) are classified into three 
types: Pko Pol-1, TO Pol-1 and TO Pol-2 (The 
Intein Database, http://wwwjieb.com/neb/ 
inteins.html). The nucleophilic residues, serine or 
threonine, in the three sites are mapped in 
Figure 2(a). In the case of KOD DNA polymerase, 
Pl-Pfol intervenes in the Pko Pol-1 site and Pl-PJfcoII 
intervenes in the TO Pol-1 site. The structure shows 
that they are localized around the polymerase 
active site in the Palm domain. Although they are 
exposed to solvent, they are surrounded by the 



Fingers subdomain and the Thumb domain. The 
two inteins cannot exist in the space because of 
steric hindrance. Therefore, it is necessary that the 
folding of inteins and the subsequent self-excisions 
are carried out before the extein is folded. 

Materials and Methods 

Crystallization 

KOD DNA polymerase was overexpressed in £. coli 
BL21(DE3) and purified by the previously reported 
method. 10 The crystals of KOD DNA polymerase were 
grown by the previously reported method. 7 . KOD DNA 
polymerase was concentrated up to about an nm of 
25. Crystals of KOD DNA polymerase suitable for dif- 
fraction experiments were obtained at 293 K with hang- 
ing drops of 2 ul of protein solution and 2 ul of reservoir 
solution containing 100 mM sodium citrate buffer 
(pH 5.5) and 25-30% (v/v) 2-methyl-2,4-pentanediol 
(MFD), equilibrated against the reservoir solution. 

Data collection 

X-ray diffraction measurements were performed at the 
beamline 18B of the Photon Factory at the High Energy 
Accelerator Research Organization, Tsukuba Science 
City, Japan. Each crystal of KOD DNA polymerase was 
picked up directly with a nylon fiber loop from a drop 
of mother liquid; the crystal was then rapidly transferred 
to the N 2 gas stream The incident beam with wave- 
length of 1.00 A was colli mated to 0.2 mm in diameter. 
Intensity data were collected on 200 mm x 400 mm ima- 
ging plates (Fuji Film Company Ltd.) using the Weissen- 
berg camera for macromolecules with a radius of 
430 mm 18 ' 19 and the oscillation method with 3 ° rotation 
per frame. The crystals diffracted at least to 2.8 A resol- 
ution at 100 K. X-ray diffraction data were processed 
and scaled with programs DENZO and SCALEPACK. 20 
The diffraction data were scaled with zero a cutoff. 
Unit-cell parameters were determined as a = 111.9 A, 
b = 112.4 A and c = 73.9 A with the space group of 
P2 1 2 1 2 1 . The unit-cell parameters gave Matthew's coeffi- 
cient of 2.60 A 3 Da^ and a solvent content of 52.2% 
(v/v). 21 The final completeness of the data consisted of 
119,205 measurements of 20,298 unique observed reflec- 
tions with an overall R ma ^ of 8.4% and 34.5% in the 
outermost resolution shell (2.90-2.80 A). This represents 
88.1 % of theoretically observable reflections at 2.8 A 
resolution. The outermost resolution shell of data is 
83.7% complete. 

Structure determination 

The crystal structure of KOD DNA polymerase was 
solved by molecular replacement with the AMoRe 
program. 12 The structure of Tgo DNA polymerase (PDB 
code 1TGO) reduced to polyalanine was used as the 
search model. Data in the resolution range of 20.0-3.5 A 
were used in both the rotation and translation functions. 
Results are discussed in terms of the AMoRe correlation 
coefficient (CQ. Using a Patterson cut-off radius of 36 A, 
a list of 20 rotation function peaks was obtained, with 
the top peak having an AMoRe CC value of 13.8. The 
top solution by translation function is CC of 43.3 with an 
R-factor of 54.1 %. At this stage, the electron density of 
the Thumb domain is very ambiguous. Therefore, struc- 
tural refinement of the initial stage was carried out with 
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Figure 5. The final 2F Q - F c map around the Fingers 
and Palm subdomains. The map is contoured at 1 a. 



a model omitting the Thumb domain. The model was 
manually modified using the program O 23 and subjected 
to further rounds of refinement using data in the resol- 
ution range 40.0-3.0 A with the program CNS. 24 The 
final K-factor is 23.1% and is 31.3%, with r.m.s. 
deviations for bond lengths and bond angles being 
0.007 A and 1.1 °, respectively. The 50 residues at the tip 
of one Thumb domain are not included in the final 
mode! due to poorly defined electron density. Figure 5 
shows the final 2F D - F c map superimposed on the 
refined final coordinates of KOD DNA polymerase. 



Protein Data Bank accession code 

Refined coordinates and structure factor have been 
deposited in the RCSB Protein Data Bank under the 
accession code 1GCX. 



Figure preparation 

Figures 1 and 2 were prepared using programs 
MOLSCRIPT 25 and Raster3D>-^ Figure 4 was prepared 
by^GRASP. 28 Figure 5 was prepared using the program 
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ABSTRACT Most known archaeal DNA polymerases be- 
long to the type B family, which also includes the DNA 
replication polymerases of eukaryotes, but maintain high 
fidelity at extreme conditions. We describe here the 2.5 A 
resolution crystal structure of a DNA polymerase from the 
Archaea Thermococcus gorgonarius and identify structural 
features of the fold and the active site that are likely respon- 
sible for its thermostable function. Comparison with the 
mesophilic B type DNA polymerase gp43 of the bacteriophage 
RB69 highlights thermophilic adaptations, which include the 
presence of two disulfide bonds and an enhanced electrostatic 
complementarity at the DNA-protein interface. In contrast to 
gp43, several loops in the exonuclease and thumb domains are 
more closely packed; this apparently blocks primer binding to 
the exonuclease active site. A physiological role of this 
"closed" conformation is unknown but may represent a poly- 
merase mode, in contrast to an editing mode with an open 
exonuclease site. This archaeal B DNA polymerase structure 
provides a starting point for structure-based design of poly- 
merases or ligands with applications in biotechnology and the 
development of antiviral or anticancer agents. 



Propagation of cells requires faithful DNA replication. This is 
performed in vivo by DNA polymerases (pols), which attach 
the appropriate dNTP to the nascent DNA primer strand to 
match its paired template. Different families of pols are 
involved in different DNA polymerization processes including 
not only DNA replication (1, 2) but also repair and recombi- 
nation (3, 4), a heterogeneity also reflected by varying 
polypeptide structures and/or subunit compositions (3, 5). 
Some pols complement polymerase activity with 3' -> 5' 
exonuclease activity (editing activity) and/or 5' -» 3' "struc- 
ture-specific endonuclease" activity, often located in separate 
structural domains on the same polypeptide chain (4-8). 

Crystal structures are available for most known polymerase 
families, including the A family DNA polymerases (9-14), pol 
j3 (15-17), HIV reverse transcriptase (18-20), and recently, 
the B family pol gp43 from bacteriophage RB69 (21). All share 
a functional polymerase structure, which resembles a right 
hand built by the palm, fingers and thumb domains (see ref. 7 
for review). Although the fingers and thumb domains are 
highly diverse among the different families, the palm domains, 
which contain the conserved catalytic aspartate residues, show 
a similar topology among all families except pol /3. The 
polymerase nucleotidyl transfer was studied in detail for the A 
family polymerases, HIV reverse transcriptase, and pol /3, and 
was shown to involve two metal ions (summarized in ref. 7). 

Considerably less is known for the family of type B pols, 
which are replicative enzymes in eukaryotes and most likely 
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also Archaea (22, 23). The structure of gp43 from bacterio- 
phage RB69 (21) provided an excellent first insight into this 
family. In addition to the three polymerase domains, gp43 
contains an 3' -> 5' exonuclease domain and an N-terminal 
domain. The exonuclease and palm domains share the topol- 
ogy and active site of A family enzymes, implying similar 
metal-assisted mechanisms for polymerase and exonuclease 
activities (21). The thumb and finger domains are apparently 
unrelated to the other polymerase families. The function of the 
N-terminal domain remains unknown, but may help assemble 
the multicomponent replication apparatus (21). 

Much is known about the replication of phages (24-26), 
viruses (1, 27), Prokaryota, (28) and Eukaryota (1, 3, 29, 30), 
which in general involves pols but also primases, helicases, 
RNaseH, sliding clamps, and other factors (31). Considerably 
less is known for archaeal replication, where mostly B type 
polymerases, similar to eukaryotic replication enzymes pol a 
and 5, have been identified (6, 22, 23, 32-34). This relative 
ignorance is surprising, because such crucial biotechnological 
applications as cloning and PCR require the thermostability 
and fidelity typical of archaeal polymerases (6). Thus, in 
addition to satisfying basic research interests, structural infor- 
mation could assist, for example, the engineering of variant 
enzymes with tailored nucleotide incorporation rates or the 
design of antiviral and anticancer polymerase inhibitors. For 
these reasons, we have determined the structure of a DNA 
polymerase from Thermococcus gorgonarius (Tgo), an ex- 
tremely thermophilic sulfur-metabolizing archaeon isolated 
from a geothermal vent in New Zealand (35). This enzyme 
possesses pol and a 3' -» 5' exonuclease activity, which 
together ensure thermostable replication with high fidelity 
(error rate: 3.3-2.2 X 10" 6 ; see ref. 36). The 2.5 A structure 
shows a topological similarity to gp43 and gives insight in the 
structural biology of archaeal DNA polymerases, including the 
identification of several mechanisms for thermophilic adapta- 
tion. 

MATERIALS AND METHODS 

Materials. All materials were of the highest grade commer- 
cially available. 

Bacterial Strains. Escherichia coli LE392 containing 
pUBS520 was used as described (36). E. coli B834 (DE3) (hsd 
metB) was a generous gift of Nediljko Budisa (Max-Planck- 
Institut). 

Expression Vectors. PBTac2 was obtained from Roche 
Molecular Biochemicals. 



Abbreviations: Tgo, Thermococcus gorgonarius', pol, DNA polymerase. 
Data deposition: The atomic coordinates have been deposited in the 
Protein Data Bank, Biology Department, Brookhaven National Lab- 
oratory, Upton, NY, 11973 (PDB ID code 1TGO). 
tPresent address: The Scripps Research Institute, La Jolla, CA 92037. 
*To whom reprint requests should be addressed, email: hopfner@ 
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Heavy-atom derivatives were prepared by soaking the crystals in low-salt buffer containing the heavy atom as follows: U, 0.5 mM uranyl acatate 
for 2 h; PT1, 5 mM K 2 Pt04 for 1 d, PT2, 5 mM KaPtCfe for 2 d; PT3, saturated oy-dichlorodipyridine-Pt(II) for 2 d; PT4, 5 mM K 2 PtCl6 for 7 
d; PT5, saturated cw-dichlorodipyridine-Pt(II) for 7 d; PTU, PT1 + U; PB, saturated dinitrophenyl-Pb(N0 3 )2 for 7 d; PBPT, PB + PT5; OS, 
saturated K 2 0s0 4 for 7 d. NAT1, PT1, PT2, PT3, PTU, PB and PBPT were collected with a Mar imaging plate and U was collected with a Bruker 
AXS area detector on a Rigaku rotating anode source. All other data sets were collected with a Mar charge-coupled device at beamline BW6 at 
DESY, Hamburg. 



Cloning, Expression, and Protein Purification. The gene for 
the 89.8-kDa DNA-dependent pol (Tgo pol) was cloned from 
Tgo (Deutsche Sammlung von Mikroorganismen no. 8976) and 
expressed in E. coli LE392pUBS520 pBtac27go (Deutsche 
Sammlung von Mikroorganismen no. 11328) as described (36). 
Tgo pol was purified essentially as described (36) with the 
substitution of the TSK butyl Toyopearl column by Blue- 
Trisacryl M (Serva) and with an additional concentration step 
on Poros 50 HQ anion exchange medium (Roche Molecular 
Biochemicals). Active fractions were combined, concentrated 
to 30 mg/ml, and transferred to 20 mM sodium phosphate, pH 
8.2/10 mM 2-mercaptoethanol/500 mM NaCl. 

The gene for a selenomethionine-containing variant of Tgo 
pol (Se-7go pol) was expressed in E. coli B842 (DE3) (hsd 
metB) using a published protocol (37). St-Tgo pol was purified 
by using the wild-type protocol. 

Crystallization. Crystals of purified Tgo pol (or Se-7go pol) 
were grown by using sitting-drop vapor-diffusion technique at 
4°C with high-salt conditions (2:2 pi protein:reservoir — 100 
mM Tris, pH 8/2.0M ammonium sulfate) and diffracted to 3.0 
A (in-house) and to 2.5 A [beamline BW6 at Deutsches 
Elektronen Synchrotron (DESY), Hamburg]. Low-salt condi- 
tions (100 mM Tris, pH 7.0/200 mM ammonium sulfate/30% 
PEG 400) yielded only poorly diffracting crystals but allowed 
soaks (including heavy atoms) with some cell constant mod- 
ulation (a, b, c = 63.6, 105.0, 160.5) but minimal loss of 
resolution. 

Data Collection and Processing. Data were collected with a 
MAR imaging plate or a Bruker AXS X1000 mounted on a 
Rigaku rotating anode source, or with a MAR imaging plate 
or a MAR CCD (charge-coupled device) at beamline BW6 at 
DESY, Hamburg. The data were processed with saint (Bruker 
AXS), mosflm (Mar CCD; ref 38), or denzo (MAR imaging 
plate; ref 39), scaled with scala (40) or scalepack (39), and 
reduced with truncate (40). 

Structure Determination. The structure was solved by mul- 
tiple isomorphous replacement and anomalous scattering 
(MIRAS) by using data from crystals transferred to low-salt 
conditions (Table 1). Cryst allograph ic calculations were done 
with programs from the CCP4 suite (40). Heavy atom positions 
of major sites were located in difference Patterson maps and 
were refined with mlphare (40) to calculate protein phase 
angles to 3.5 A resolution. A partial polyalanine model was 
built into interpretable portions of secondary structural ele- 
ments of the miras map by using main (41). The quality of the 
electron density was improved by phase combination of the 
partial model with the experimental phases by using sigmaa 



(40), and several cycles of solvent flattening to 3.0 A by using 
Solomon (40). At this stage, no interpretable density was 
found for a significant portion of the molecule, comprising 
residues 147-154, 283-306, 653-728, and 752-773. 

Model Building and Refinement. The partial model (R 
factor 35%) was used to phase the 2.5 A resolution data of the 
Se-7g<? pol (high-salt conditions). The model was oriented with 
amore (40). The correlation coefficient of 22.0% and the R 
factor of 50.3% showed divergence of the high- and low-salt 
structures. After bulk solvent correction, anisotropic B factor 
correction and rigid-body minimization (treating five domains 
independently), the partial model was iteratively refined and 
extended with simulated annealing, Powell minimization, re- 
strained individual B factor refinement with cns (42), and 
manual model building with main (41) by using data from 
25.0-2.5 A resolution (Table 2, Fig. 1). 

RESULTS AND DISCUSSION 

Structure of Tgo pol. Tgo pol is a ring shaped molecule with 
dimensions 50 A x 80 A X 100 A The single polypeptide chain 
of 773 aa is folded into five distinct structural domains (Fig. 2): 
the N-terminal domain (residues 1-130), the 3' -> 5' exonu- 
clease domain (131-326), the palm (369-449 and 500-585), 
fingers (450-499), and thumb (586-773) domains of the 
polymerase unit, and a helical interdomain insertion (327-368) 

Table 2. Crystallographic refinement, high-salt form 
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Fig. 3. Sequence alignment of B family DNA polymerases. The 
alignment has been adapted from ref. 21 to highlight specific residues 
from the class of archaeal pols. The secondary structure of Tgo pol is 
indicated on top of the alignment with helices (bars), strands (arrows) 
and loops (lines) colored according to domains with the same color 
code as Fig. IB. Strictly conserved residues of type B polymerases are 
red, and additional conserved residues are yellow. Uniquely conserved 
residues of archaeal type B enzymes — as discussed in the text — are 
green. Disulfide bonds are shown by a bar on top of the alignment. 
Abbreviations: tgo, Thermococcus gorgonarius pol; pfu y Pyrococcus 
furiosus pol; tsp, Thermococcus sp. pol; tU\ Thermococcus Utoralis pol; 
rnvo, Methanococcus voltae pol; RB69, bacteriophage RB69 pol; T4, 
bacteriophage T4 pol; Eco\ E. coli pol II; pol 8; human pol 8. 

packed against the five-stranded antiparallel 0-sheet that 
contains the three conserved aspartate residues involved in 
nucleotidyl transfer. The fingers emerge from the palm domain 
as an a-helix-rich insertion. Its 50 residues are folded into two 
antiparallel coiled a-helices of approximately equal size: helix 
P contains the conserved ICX3NSXYGX2G motif of B type 
polymerases and is related to the O helix of A type enzymes 
(see below). The «50-residue insertion between helix O and P 
in RB69 and T4 gp43 is missing in Tgo pol, where both helices 
and 4 residue linker are much shorter than their equivalents in 
gp43. The shorter fingers of Tgo pol presumably reflect the 
typical structure of the nonbacteriophage B type fingers (pol 
a, pol 8, and E. coli pol II). The thumb domain topology, 
similar to that of gp69, is unrelated to other polymerase types. 
However, like the thumb of A type enzymes, a bundle of 
a-helices at its base protrude from the active site 0-sheet. 
Distal to the active site, the thumb contains a 75-residue 
subdomain (665-729), which fixes the exonuclease domain and 



contributes to the editing channel, explaining why mutations in 
the exonuclease domain of B-type polymerases affect the 
polymerase activity and vice versa (44; 45). 

Weakly defined density across the base of the thumb domain 
was modeled as the C-terminal 6 residues with a polyalanine 
chain. The C terminus thus does not protrude from the core 
molecule as in the RB69 polymerase (21). Because the C 
terminus of the T4 pol are involved in sliding-clamp binding 

(46) , it is likely, however, that these residues become ordered 
on any similar holoenzyme formation. 

Sequence Alignment of Archaeal DNA Polymerases. The 
structure of Tgo pol allows the generation of a structure based 
sequence alignment of the archaeal subfamily of type B DNA 
polymerases, the location of conserved and unique residues, 
and the comparison with other type B DNA polymerases (Fig. 
3). 

Polymerase Active Site. The polymerase active site is formed 
by the central 0-sheet (strands 16, 17, 20, 21, and 22) and helix 
N of the palm domain and helix P located in the fingers and is 
highly conserved among B family polymerases (Fig. 4). Three 
carboxylates required for nucleotidyl transfer in B family 
polymerases, two of which coordinate two metal ions (14) are 
superimposably conserved among A family enzymes, B family 
enzymes, and reverse transcriptase (21). Superposition of Tgo 
pol and T7 replication complex (14) places the dNTP near the 
proposed nucleotide-binding site in helix P, the 
K487x3NSxYGx 2 G motif (Fig. 3) and suggests interactions of 
the carboxylates with metals and the phosphate tail of the 
bound dNTP (Fig. 4). Reorientation of the strictly conserved 
Lys-487 allows it to mimic the Lys-522-phosphate tail inter- 
action in T7. Tyr-494 (Kx 3 NSxY494Gx 2 G) and Tyr-409 
(SLY409PSII) form the bottom of the nucleotide-binding site. 

The active site of B family polymerases contains a DTDS 
motif, which, however is DTDG in the archaeal subfamily. In 
Tgo pol, the relatively conserved iyr-402 from the adjacent 
strand provides an alternate alcohol group, at a position 
appropriate for metal coordination or binding of the 3' end of 
the primer. The orientation of iyr-402 is stabilized by an 
aromatic cluster that also includes Phe-545 and iyr-538. 
Archaeal Methanococcus voltae and Thermococcus sp. pol's 
(see Fig. 3) have Tyx at position 545 — Phe in Tgo — rather than 
at 402, but might also supply an alcohol group. The displace- 
ment of a functional alcohol from serine in DTDS to Tyr-402 
or Tyr-545 might stabilize its orientation as an adaptation for 
thermostability. 

The conserved cluster of acidic amino acids (E578, Ii580) 
form an unexpected metal-binding site for Mn 2+ and Zn 2+ 
(Fig. 4). Its proximity to Asp-404 and to the expected location 
of the dNTP y-phosphate suggests a supporting role in nucle- 
otide binding and/or catalysis. 

3' —> 5' Exonuclease Active Site. Tgo pol is characterized by 
a strong 3' ->5' exonuclease activity, unlike eukaryotic B type 
polymerases (unpublished results). The exonuclease active site 
is formed at the interface between the exonuclease domain and 
the tip of the thumb (Fig. 5). All residues required for catalysis 
are located in the exonuclease domain, which, at least for T4 
gp43, retains activity when dissociated from the polymerase 

(47) . However, the thumb domain, with, for example, RB69 
gp43's Phe-123^base intercalation, partially controls the bind- 
ing geometry of single strand DNA (21, 43). 

Hie exonuclease structures of Tgo and gp43 DNA poly- 
merases are similar at the editing site but differ considerably 
at the exonuclease-thumb interface. Strand 10 contains the 
metal-binding D141IE motif and readily superimposes with 
the equivalent strand from gp43, allowing modeling of a 
single-strand DNA segment into the exonuclease site based on 
theRB69 gp43-p(dT) 4 complexes (21). The conserved residues 
Asp-141 and Glu-143 in the Exo I motif, Tyr-209, Asn-210, 
Phe-214, and Asp-215 in Exo II, and Tyr-311 and Asp-315 in 
jSxo III are in approximate DNA-binding conformations (Fig. 
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5/4). However, the editing cleft is constricted by a displacement 
of the tip of the thumb toward the exonuclease domain to 
prohibit single-strand binding (Fig. SB). This shift is correlated 
with a large change in the loop between strands 10 and 11. In 
RB69 gp43 (and likewise in T4 gp43), this loop forms a lid over 
the 3' base and contains Phe-123, which intercalates between 
the first two bases. In Tgo pol, the loop is curved outward, away 
from the thumb, and Phe-152 (the equivalent of gp43's Phe- 
123) attaches to Phe-214 10 A away from intercalation site. 
This shift allows the tip of the thumb to move into the editing 
channel and to block the exonuclease site. 

Are There Different Conformations in Polymerase and 
Editing Mode? If a closed conformation of the exonuclease 
domain prohibits single strand binding, an open conformation 
is required for editing. The observed closed conformation may 
represents the enzyme in "polymerase" mode. Preliminary 
analysis of the crystal structure of Tgo pol in the low-salt 
conditions indicates a structural change at the interface of 
exonuclease and thumb, possibly reflecting a transition be- 



tween open and closed forms. The closed conformation ob- 
served here may, however, be a nonphysiological artifact of the 
high ionic strength used for crystallization. Crystal structures 
of the enzyme in both polymerase and editing modes are 
required. 

Adaptation to High Temperatures. Tgo is a sulfur- 
metabolizing, extremely thermophilic archaeon, with a growth 
range between 55°C and 98°C. For accurate replication at this 
temperature range, the polymerase must not only be stable, but 
must also adequately bind substrate DNA. A comparison with 
gp43 from the mesophilic bacteriophage RB69 indicates sev- 
eral such adaptations to high temperatures. Several loops are 
shorter in Tgo pol than in gp43 (Fig. 2), and there is an increase 
in hydrogen bonded 0-strand content: Tgo pol secondary 
structure includes 41% helix, 22% 0-strands, and 19% turns 
(calculated according to ref. 48), whereas gp43 has 42% helix, 
17% 0-strands, and 19% turns. 

Although rare among cytoplasmic or nuclear proteins, two 
disulfide bridges might be formed: cysteine pairs 428-442 and 
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506-509, although reduced, are poised for attachment (Fig. 2). 
Model refinement and electron density inspection with and 
without constraints for the disulfide bonds verified the reduced 
state (observed unrestrained SG-SG distance: 2.8 A and 3.0 
A). This is consistent with our E. coli expression and further 
rules out structural perturbation by nonnative oxidation. These 
cysteines are located in the palm domain and are conserved 
among B type enzymes from hyperthermophilic sulfur- 
metabolizing archaeons, but not among mesophile homologs 
(Fig. 3). The Cys-428-Cys-442 bridge stabilizes the compact 
fold of the loop segment between helix N in the palm domain 
and helix O in the fingers and presumably also the relative 
orientation of these helices. In addition, the loop segment 
packs against helix Q in the palm domain. Helix Q, the spine 
of the palm domain, is further stabilized at the first helical turn 
by the second disulfide bridge between Cys-506 and Cys-509. 

A much enhanced complementary positive potential for all 
three DNA-binding clefts of Tgo pol is observed relative to 
gp43 (Fig. 2). Thus, in addition to hydrogen bonding and 
specific DNA-protein interactions, binding to Tgo pol has an 
additional strong stabilizing electrostatic component. 

An increase in the number of salt bridges is often associated 
with thermostability. Although Tgo pol has a greater total 
number of charged residues (262) than gp43 (245), both 
molecules have 54 salt bridges within a 3-5 A bound. However, 
in the 5-7 A range of charge distance, Tgo pol has 77 ion pairs 
compared with 43 for gp43. This large increase results in a 
more highly charged surface of Tgo pol, accompanied by a 
more balanced charge distribution, compared with gp43 where 
charges are often located in patches (Fig. 2). 
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Crystal structure of an archaebacterial DNA polymerase 
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Background: Members of the Pol II family of DNA polymerases are responsible 
for chromosomal replication in eukaryotes, and carry out highly processive DNA 
replication when attached to ring-shaped processivity clamps. The sequences 
of Pol II polymerases are distinct from those of members of the well-studied 
Pol I family of DNA polymerases. The DNA polymerase from the 
archaebacterium Desulfurococcus strain Tok (D. Tok Pol) is a member of the 
Pol II family that retains catalytic activity at elevated temperatures. 

Results: The crystal structure of D. Tok Pol has been determined at 2.4 A 
resolution. The architecture of this Pol II type DNA polymerase resembles that 
of the DNA polymerase from the bacteriophage RB69, with which it shares less 
than -20% sequence identity. As in RB69, the central catalytic region of the 
DNA polymerase is located within the 'palm' subdomain and is strikingly similar 
in structure to the corresponding regions of Pol I type DNA polymerases. The 
structural scaffold that surrounds the catalytic core in D. Tok Pol is unrelated in 
structure to that of Pol I type polymerases. The 3-5' proofreading exonuclease 
domain of D. Tok Pol resembles the corresponding domains of RB69 Pol and 
Pol I type DNA polymerases. The exonuclease domain in D. Tok Pol is located 
in the same position relative to the polymerase domain as seen in RB69, and on 
the opposite side of the palm subdomain compared to its location in Pol I type 
polymerases. The N-terminal domain of D. Tok Pol has structural similarity to 
RNA-binding domains. Sequence alignments suggest that this domain is 
conserved in the eukaryotic DNA polymerases 8 and e. 
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Conclusions: The structure of D. Tok Pol confirms that the modes of binding of 
the template and extrusion of newly synthesized duplex DNA are likely to be 
similar in both Pol II and Pol I type DNA polymerases. However, the mechanism 
by which the newly synthesized product transits in and out of the proofreading 
exonuclease domain has to be quite different The discovery of a domain that 
seems to be an RNA-binding module raises the possibility that Pol II family 
members interact with RNA. 



Introduction 

DNA polymerases can be classified into at least three fam- 
ilies on the basis of sequence similarities to the three dis- 
tinct DNA polymerases of Escherichia colt, Pol I, Pol II and 
Pol III [1]. Members of the Pol I family have been studied 
extensively, resulting in a comprehensive understanding 
of their functional properties and their structure [2-6]. In 
contrast to the detailed knowledge that is now available 
for the Pol I family, the Pol II and Pol III polymerases are 
poorly understood. The first crystal structure determined 
for a Pol II family member was that of the DNA poly- 
merase of the bacteriophage RB69 (RB69 Pol) [7] and no 
structural information is currently available for any 
member of the Pol III family. Members of the Pol II (also 
known as Pol B or Pol a) and Pol III families carry out pro- 
cessive replication of chromosomal DNA during cell divi- 
sion [8], and there is interest in further extending our 



knowledge of their structures and mechanism. Archaebac- 
terial DNA polymerases and the eukaryotic DNA poly- 
merases a, 8 and e are members of the Pol II family [1]. 

The structure of RB69 Pol revealed that the general archi- 
tecture of the core of the Pol II polymerases is strikingly 
similar to that of the Pol I polymerases [7]. Pol I poly- 
merases are constructed from three smaller subdomains, 
termed the thumb, palm and fingers regions by analogy to 
elements first noted in the structure of the Klenow frag- 
ment of E. coli DNA polymerase I [9]. In addition, Pol I 
DNA polymerases have a proofreading 3-5' exonuclease 
domain located below the thumb subdomain, near the 
region where duplex DNA exits the polymerase active site 
[4,5]. Besides the residues involved in catalysis, there is no 
significant sequence similarity between the polymerase 
domains of members of the Pol I and Pol II families [1]. 
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However, the subdomain architecture of the Pol I family is 
conserved in the RB69 structure, even though the detailed 
structures of the subdomains are quite divergent [7], The 
exonuclease domains of Pol I and Pol II DNA polymerases 
are closely related in sequence and, not surprisingly, the 
structure of the exonuclease domain of RB69 resembles 
that of the Pol I type polymerases. Given the general simi- 
larity in the polymerase domains of the Pol I polymerases 
and RB69, the location of the exonuclease domain in RB69 
was a surprise. In RB69 the 3-5' exonuclease domain is 
located above the fingers and opposite the thumb subdo- 
mains, suggesting that the shuttling of DNA between the 
polymerization and proofreading sites must occur by a dif- 
ferent mechanism in Pol II DNA polymerases [7]. 

The mechanism of the Pol I family DNA polymerases is 
now understood in detail [4,5,10,11,28]. The chemistry of 
nucleotide addition is mediated by two metal ions that are 
liganded by two aspartate residues. These are located in 
the palm subdomain, at the base of a deep cleft in the poly- 
merase domain. High-resolution crystal structures of the 
Pol I type DNA polymerases of T7 bacteriophage (T7 Pol) 
and Thermos aquations (Taq Pol) complexed to primer-tem- 
plate DNA and incoming nucleotide have been deter- 
mined, allowing the mechanisms of nucleotide 
incorporation and selectivity to be visualized [10,11,28]. 
Although corresponding structural information for the Pol 
II family DNA polymerases is lacking, similarities in 
general organization of the polymerase core as well as 
sequence conservation within crucial elements of the 
central palm subdomain suggest that general features of the 
recognition of DNA will be similar in Pol II polymerases. 

The DNA polymerase from the archaebacterium Desul- 
furococcus strain Tok (D. Tok Pol) is a member of the Pol 
II family, and has both thermostable DNA polymerase 
and 3-5' exonuclease activities [12]. D. Tok Pol sustains 
undiminished DNA polymerase activity after incubation 
at 95°C for one hour (RL, unpublished results). The 
sequence of D. Tok Pol is very closely related (> 75% 
identity) to that of other archaebacterial DNA poly- 
merases, such as those from Pyrvcoccus furiosus [13] and 
Thtrmococcus littoralis [14]. D. Tok Pol is also related to 
eukaryotic DNA polymerases a, 8 and e (34% sequence 
identity over 196 residues of the DNA polymerase core for 
the human 5 sequences) [1]. The archaebacterial genomes 
also contain genes coding for proteins with clear homology 
to proliferating cell nuclear antigen (PCNA), the DNA 
polymerase clamp in eukaryotes, as well as subunits of the 
clamp-loader complex RF-C (replication factor C). It is 
likely that archaebacterial DNA polymerases achieve pro- 
cessivity by attachment to the ring-shaped PCNA ring, 
although direct evidence for such a mechanism is lacking. 

We have determined the structure of D. Tok Pol at 2.4 A 
resolution. D. Tok Pol shares less than 20% sequence 



identity with RB69 Pol, but the structures of the two 
enzymes resemble each other closely. The structure 
reported here has been determined in the absence of 
DNA. Nevertheless, the close structural correspondence 
between the active sites of Pol I and Pol II DNA poly- 
merases allows inferences to be made about the mode of 
DNA recognition by D. Tok Pol. The very N-terminal 
region of D. Tok Pol contains a domain (residues 1-132) 
that is closely related in structure to single-stranded RNA- 
binding domains (RBDs), also known as RNA-recognition 
modules (RRMs) [15]. The structure of the 3-5' proof- 
reading exonuclease domain of D. Tok Pol is similar to 
those of the Pol I type polymerases. However, its location 
relative to the palm subdomain resembles the location 
seen in RB69 [7] rather than the Pol I type polymerases 
[9,16,23]. The structure of D. Tok Pol reported here pro- 
vides further evidence that the mode of DNA-template 
recognition and the distinct editing channel established 
for the Pol II family by the structure of RB69 Pol is valid 
for the entire Pol II family. 

Results and discussion 
Structure determination 

Crystals of D. Tok Pol have been obtained from 
2,4-methylpentanediol (MPD) (Native I) and polyethyl- 
ene glycol (PEG) 400 (Native II). Both crystal forms are 
orthorhombic (P2,2,2,; a = 64.8 A, b = 107.6 A, c = 153.2 A 
for Native I and a = 66.1 A, b= 107.6 A, c= 155.9 A for 
Native II). Experimental phases (Table 1) to 3.0 A were 
obtained from four isomorphous heavy-atom derivatives, 
using Native II and the program SHARP [17]. Phases were 
improved by iterative cycles of real-space density modifi- 
cation, consisting of solvent flipping and negative density 
truncation, using SOLOMON [18,19]. The resulting elec- 
tron-density map allowed the chain to be traced unam- 
biguously, with ready determination of sequence register. 
The model was refined to 2.6 A against data for Native II 
(R value = 24.2%, R^ = 29.5%) and subsequently to 2.4 A 
against data for Native I (R value « 25.3%, R^ = 29.9%), 
using CNS [20]. The model for Native II is somewhat 
more complete (see the Materials and methods section) 
and is used for most of the discussion. This model includes 
740 residues from 1 to 756 in Native II. Amino acids 
386-390 and 665-676 are not visible in our electron- 
density maps and are not included in the model. 

General description of the structure 

D. Tok Pol (Figure 1) is composed of a polymerase 
domain (residues 390-773) and an exonuclease domain 
(residues 133-385), as well as an N-terminal domain 
(residues 1-131) that is not found in Pol I type DNA poly- 
merases [4]. The polymerase domain is further comprised 
of three smaller subdomains, termed the thumb (residues 
607-756), palm (residues 390-445 and 500-606) and 
fingers (residues 446-499). The structures of the MPD 
and PEG400 crystal forms of D. Tok Pol are very similar 
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Table 1 



Data collection, structure determination and refinement statistics. 





Resolution 
(A) 


Number of Completeness 
reflections (unique) (%) 


R sym* 
%) 


(%) 


Sites 


Phasing 
power* 


Figure of 
merit 8 


Native data 


















Native il 


50.0-2.6 


32,909 


93.9(57.8) 


5.2(15.6) 










Native I 


50.0-2.4 


40,540 


92.2(53.8) 


4.6(31.9) 


57.4 








MIRAS analysis 
















0.367 


Pt 


50.0-3.0 


40.316 


97.9(93.0) 


8.4(22.9) 


19.1 


4 


1.34(0.99) 


0.214 


Pb 


50.0-3.0 


34,905 


84.2(70.4) 


5.9(18.6) 


13.6 


1 


1.16(0.98) 


0.195 


Pt/Pb 


50.O-3.0 


35.107 


80.8(57.4) 


9.9(21.9) 


18.3 


5 


1.59(0.80) 


0.221 


Refinement 




Number of Reflections 




Total number 


Rmsdfor 


Rmsdfor 


Rmsd for 






<|F| > 2o) 




(%) 


of atoms 


bonds (A) 


angles (*) 


B values (A 2 ) 


Native II 


50.0-2.6 


31,591 




24.2/29.5 


6.167 


0.008273 


1.61591 


1.691 


Native I 


50.0-2.4 


37,229 




25.3/29.9 


6.145 


0.008273 


1.50479 


1.409 



•Rsyn/ft = 1 00 X 1|I - < I >| / II, where I is the integrated intensity of a 
given reflection. t R iso % = 1 00 X IjFpu-Fpl / D> where Fp„ and F p are 
the derivative and native structure factor amplitudes, respectively. 
* Phasing power = £|F H | / ZHFp^obs)! - (F^calc)]), where F H is the 
calculated heavy atom structure factor amplitude. ^Figure of 



merit = <|LP(a)e ta /Z|P(a)|>, where a is the phase and P(a) is the 

phase probability distribution. 

'Rworking = L|F(obs) - F (calc)| / LF(obs). 

HR^ = I|F(obs) - F (calc)| / IF(obs). calculated using 10% of the 

data. Numbers in parentheses apply to the highest resolution shell. 



in terms of the individual subunits. The major difference 
between the two structures is a rotation of -8-10° in the 
orientation of the exonuclease domain with respect to the 
thumb subdomain. 

The domains of D. Tok Pol are arranged as an irregularly 
shaped flattened ring with a central cavity located near 
the polymerase active site. The mostly a-helical thumb 
subdomain forms one side of the active-site cleft and 
makes contacts with the exonuclease domain (Figure 1). 
The structures of the thumb domains of various poly- 
merases are often unrelated in structure. However, in all 
cases where structures are available the thumb domain is 
seen to fulfil an important role by forming contacts with 
duplex DNA as it exits the polymerase active site [4J. The 
D. Tok Pol structure has been determined in the absence 
of DNA, and a portion of the thumb subdomain that is 
likely to contact DNA (residues 665-676) is disordered. 
This is commonly observed for the corresponding regions 
of other polymerases in the absence of substrate 
[9,21-24]. In the DNA polymerases from bacteriophage 
T4 and RB69, the thumb subdomains also provide a 
C-terminal element that interacts with the processivity 
clamp [25,26]. In D. Tok Pol, the corresponding region 
(residues 757-773) is disordered. 

The central region of the active-site cleft is occupied by 
the palm subdomain and includes residues important for 
substrate discrimination and the catalysis of the poly- 
merase reaction. In D. Tok Pol, the palm is organized 
around three P strands (pi 6, pi 9, P20) flanked by an a 
helix (aQ) (Figures l,2a,3a). It contains two disulfide 



bonds (Cys428-Cys442, Cys506-Cys509) that have not 
been previously observed in palm subdomains and which 
may be important for thermostability (Figure 1). 



Figure 1 




Structure of D. Tok Pol. The structure is represented by cylinders for 
helices, arrows for strands, and a thin worm for other secondary 
structural elements. Two gray spheres represent metal ions (presumed 
to be Mg 2+ ) observed to be bound to the exonuclease domain. The 
active site of the polymerase is marked by the location of two 
aspartate residues D404 and D542. The two disulfide bonds are 
indicated. Regions of the polypeptide chain that could not be modeled 
in the palm subdomain because of disorder are indicated by dotted 
lines. The various domains and subdomains and their boundaries are 
indicated in the bar. 
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Figure 2 



(a ! O. Tok Pol 0UT7DNA Pol I Com P arison DNA polymerase structures. 

(a) A view of the secondary structural 




elements of the polymerase active-site region 
(palm and fingers subdomains) of D. Tok Pol, 
colored as in Figure 1 . (b) The corresponding 
region of T7 DNA polymerase including the 
primer-template duplex from the crystal 
structure (PDB code 1 T7P [1 1 ]). The 
orientation of T7 Pol was derived by 
superposition onto strands pi 6, p1 9. and (J20 
of D. Tok Pol. D. Tok Pol helix ocP is seen to 
be in an analogous position relative to the 
active-site aspartates as T7 Pol aO. (c) A 
GRASP surface representation of D. Tok Pol 
with modeled primer-template duplex from the 
T7 DNA polymerase-DNA complex (PDB 
code 1 T7P [1 1 ]). The surface is colored 
according to sequence similarity (40-100%) 
calculated as in Figure 7c. The primer strand 
is an orange worm representing phosphate 
positions, and the template strand is in gray. 



Primer-template DMA 
fromT7pol 



The central elements of the palm subdomains from poly- 
merases belonging to the Pol I and Pol II families can be 
aligned closely (the root mean square deviation [rmsd] in 



Cct positions for strands pl6, 019, 020 and helix aQ is in 
the range of 0.9-2.0 A), indicating a potential conservation 
of function. There are two residues in the palm domains 



Figure 3 
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Thumb 



<b) 




D.ToKt»©» 



RQ03 Pol 



Comparison of the structures of D. Tok Pol 
and RB69 Pol. The structures of 
(a) D. Tok Pol and (b) RB69 Pol are 
presented in the same orientation after 
superposition of their respective palm 
subdomains. Structural elements that are in 
common between the structures are 
represented and colored as in Figure 1 . 
Elements that are unique to RB69 Pol are 
colored in gray. Disordered segments are 
indicated by dotted lines. The N-terminal and 
the exonuclease domains are not shown. 
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of Pol I polymerases that are crucial for enzymatic activity 
because they coordinate two metal ions [2,10,11,27]. The 
corresponding residues in D. Tok Pol are Asp404 and 
Asp542 (Figure 1). No metal ions are, however, visible in 
our electron-density maps. 

The fingers subdomain in D. Tok Pol consists of a set of 
antiparallel a helices (aN, aO, aP; Figure 2). These 
helices are shorter in length than the corresponding ele- 
ments of RB69 Pol, and a helical segment that connects 
helices O and N in RB69 Pol is missing altogether 
(Figure 3). The fingers domain of D. Tok Pol is unrelated 
in overall structure to that of Pol I type polymerases 
(Figure 2). However, helix otP in D. Tok Pol is positioned 
similarly to helix O in Pol I polymerases (Figure 2), and is 
likely to play an analogous and crucial role in recognition 
of the incoming nucleotide [9-1 1,28]. 

The 3'-5' exonuclease domain in D. Tok Pol is located 
opposite the thumb subdomain and above the fingers sub- 
domain, as noted for RB69 Pol. It contains two metal ions 
(presumably Mg 2 *) ligated to Asp 14 1 and Glu 143 
(Figure 1). The position of this domain relative to the 
polymerase active site is distinct from the arrangement 
seen in Pol I type polymerases. The conservation between 
RB69 and D. Tok Pol of the location of the exonuclease 
domain suggests that this is a characteristic feature of Pol 
II type polymerases. The structure of the D. Tok Pol 
3-5' exonuclease domain resembles those associated with 
other DNA polymerases [29,30]. The 3-5' exonuclease 
domains from the Pol I (E. colt, T. aquaticus, Bacillus sut- 
tilisy bacteriophage T7) or Pol II (RB69) polymerase fami- 
lies can be aligned onto each other closely (rmsd in Cot 
positions for strands P10, pi 1, pi2, pi4 and helices ccE and 
od is in the range of 1.0-2.8 A). This alignment superim- 
poses residues associated with substrate binding, catalysis 
and metal binding in a satisfactory manner (Figure 4) [4]. 

The arrangement of the N-terminal, exonuclease, and 
polymerase domains creates two deep grooves leading into 
and out of the polymerase active site. The D groove (for 
duplex-DNA binding, following the nomenclature of [7]) 
is located immediately below the thumb subdomain and 
includes a region of positive electrostatic potential. The 
T groove (for tempIate-DNA binding) leads away from 
the active site in the opposite direction and is located 
below the fingers subdomain. A small channel (the editing 
channel) leads from the polymerase domain to the exonu- 
clease active site (Figures 2c). 

We have used the structure of T7 Pol bound to primer- 
template DNA to model DNA onto D. Tok Pol 
(Figure 2c). Superposition of the palm subdomains of the 
two polymerases shows that remarkably few bad contacts 
are formed between the DNA (from T7 Pol) and atoms in 
the D. Tok Pol model. The one region that does collide 



Figure 4 




Structural alignment of exonuclease domains. Structures of 
exonuclease domains from KF. 1 WAJ, 1T7P. 1 BDF, and 1TAQ have 
been aligned by superimposing residues 1 37-1 45, 1 58-1 64, 
167-172. 205-220. 257-260, and 303-313, which represent 
strands 08. 09, 01 0, 01 2, 01 5 and helix aE. al. A color gradient is 
used to depict the average rmsd for the family of superimposed 
structures ranging from blue (1 .0-1 .5 A) to white (> 4.0 A). Residues 
conserved amongst exonuclease sequences and implicated in 
catalysis are drawn in green ball and stick representation. Two gray 
spheres represent two metal ions bound at the active site. The active 
site is also indicated by a tetranucleotide (in gold) derived from 
superposition of the exonuclease domain from the RB69 Pol structure. 



with the DNA is the segment connecting the exonuclease 
and polymerase domains. This region (residues 377-390) 
is partially disordered in the D. Tok Pol structures, and is 
likely to reorganize upon binding DNA. This superposi- 
tion allows five base pairs of DNA to be accommodated in 
the D. Tok Pol active site, with the formation of 
DNA-protein contacts. The formation of contacts with 
additional base pairs would require a change in the posi- 
tion of the thumb subdomain in the region of the 
D groove. A change in the conformation of the fingers sub- 
domain (helices aO and aP) is also required to position 
residues Lys487 and Tyr493 (or Tyr494) of D. Tok Pol 
(Figure 2) for interaction with the incoming nucleotide, by 
analogy with the T7 Pol structure [11J. Finally, the super- 
imposed primer-template DNA is well positioned so that 
the incoming template strand will probably reside in the 
T groove. Superposition of the DNA molecule derived 
from the structure of HIV-1 reverse transcriptase com- 
plexed to DNA [31] leads to similar conclusions. 

Comparison between D. Tok Pol and RB69 Pol 

Although the DNA polymerases from D. Tok Pol and bac- 
teriophage RB69 share less that 20% primary sequence 
identity (Figure 5), their structures resemble each other 
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Figure 5 
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Figure 5 



Structure-based sequence alignment for D. Tok Pol (DTOK), RB69 Pol 
(RB69) and human Pol 8 (HUMd). The HUMd sequence begins at 
residue 110, as indicated by the number at the beginning of the 
sequence. The alignment is colored by sequence similarity (40%. white 
to 1 00%, green) calculated as described in Figure 7c. Shown here is a 
small subset of a larger set of sequences that were used to generate 
the alignment The full sequence alignment is available at 
http://www.rockefeller.edu/Kuriyan. The respective secondary 
structural elements colored as in Figure 1 are represented by helices 
as cylinders, strands as arrows, and other as thin lines. Gray circles 
represent portions of the polypeptide chain that could not be modeled. 



closely (Figure 3). Not surprisingly, the regions of highest 
sequence similarity are concentrated in and around the 
exonuclease and polymerase active sites (Figures 2c,5). 
Despite the low overall sequence identity, the individual 
subdomains in the two structures superimpose well (the 
rmsd in Ca positions in the fingers, thumb and palm sub- 
domains is in the range of 0.8 to 1.5 A). Moreover, the 
overall arrangement of domains and subdomains with 
respect to each other is preserved in the two polymerases, 
strengthening the proposal that Pol II DNA polymerases 
share a common architecture (Figure 3). 

One difference between the overall structures of 
D. Tok Pol and RB69 Pol concerns the orientation of the 
exonuclease domain with respect to the rest of the structure. 
When the two polymerases are superimposed on their 
respective palm subdomains it is seen that the exonuclease 
domain of RB69 is rotated inwards by -8°, burying the 
active site in a solvent-inaccessible configuration [7]. In con- 
trast, the exonuclease domain in D. Tok Pol has its active 
site essentially exposed to solvent. It is possible that confor- 
mational changes between open and closed configurations 
of the exonuclease domain are a part of the functional cycle 

Figure 6 



Comparison of surface charges in D. Tok Pol 
and RB69 Pol. Accessible-surface 
representation of (a) D. Tok Pol and (b) RB69 
Pol in the same orientation after superposition 
of their palm subdomains. Surface regions 
corresponding to the terminal oxygen atoms 
of aspartate and glutamate are colored red. 
whereas surface regions contributed by the 
si dec ha in nitrogen of lysines and arginines are 
colored blue. D. Tok Pol has a striking pairing 
of oppositely charged residues not seen in 
RB69 pol. A representation of D. Tok Pol as a 
worm is included for orientation. 



of the protein, particularly as the two different forms of 
D. Tok Pol differ in the orientation of the exonuclease 
domain (not shown). 

One interesting difference between D. Tok Pol and RB69 
Pol is that the former is a thermostable DNA polymerase 
whereas the latter is not. Unfortunately, attempts to iden- 
tify features in the D. Tok Pol structure that might be cor- 
related with thermostability is complicated by the very low 
sequence similarity between the two enzymes. One feature 
that does stand out, however, is the increased formation of 
arrays of ionic interactions on the surface of D. Tok Pol 
when compared to that of RB69 Pol (Figure 6). The forma- 
tion of networks of ionic interactions has been noted to cor- 
relate with thermostability in other proteins [16,32,33], 

Generally, D. Tok Pol subdomains tend to be more 
compact, with smaller helices and shorter loops than are 
found in RB69 Pol, a feature that may be another important 
source of thermostability. For example, the palm subdo- 
main displays close structural conservation of elements near 
the catalytic aspartate residues. However, helix aR in 
D. Tok Pol is much shorter that its counterpart in RB69 
Pol, and a small substructure in front of the palm subdo- 
main is entirely missing in D. Tok Pol (Figures 3,5). Dele- 
tion of these elements is also seen in a representative set of 
archaebacterial DNA polymerases [13,14]. Likewise, the 
fingers subdomain is missing a large mass of from its tip in 
D. Tok Pol (Figures 3,5). However, the RB69 fingers 
extension most probably plays a T4 phage-specific role, as 
it is also missing from our alignments of archaebacterial 
DNA polymerases and eukaryotic polymerases 8 (Figure 5). 

The N-terminal domain resembles RNA-binding domains 

The N-terminal domain of D. Tok Pol has no correspond- 
ing element in Pol I type polymerases. Analysis of the 
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Figure 7 
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The N -terminal domain of D. Tok Pol. (a) Structural conservation of 
RNA-binding domains: structures of RNA-binding domains from 1 HA1 
(domains A and B), 1 PYS, 1 RIS. and 1 URN (molecule 2) have been 
aligned by superimposing (LSQMAN, SUPERPOSE) D. Tok Pol 
residues 40-1 10, which represent four strands (04. £5, 05, 06) and 
two helices (ccA and aB). The N -terminal domain of D. Tok Pol is shown. 
A color gradient is used to depict the average rmsd in Cot positions for 
the family of superimposed structures, ranging from blue (1 .0-1 .5 A) to 
white (> 4.0 A). Certain aromatic residues in D. Tok Pol (white) are 
shown; these represent a potential RNA-binding surface. This view is 
rotated by approximately 1 80" from that in Figure 1 . (b) An RN A stem- 
loop from the U1 A-RNA complex (PDB code 1 URN, [53]) modeled 
onto the N -terminal domain of D. Tok Pol. The model was generated by 
superimposing the U1 A RBD onto the N -terminal domain of D. Tok Pol 
using the conserved structural elements. The RNA is drawn in blue with 
the sugar-phosphate backbone represented as a worm and the bases 
in ball and stick representation. A partial surface that represents the 
interface between the N -terminal domain of D. Tok Pol and the 
exonucfease domain is shown In gray. The location of the modeled RNA 
relative to the polymerase active site is depicted by marking the position 



of residue Y494. The location (derived after superposition) of the 
guanosine monophosphate (GMP) molecule bound to the Incomplete* 
RBD of RB69 Pol, drawn in light green, nearly overlaps with the 
positions of the bases of the modeled RNA stem-loop, (c) Structural 
and primary sequence alignment of RNA-binding domains. Sequence 
alignment of the N-terminal domains from D. Tok Pol and RB69 Pol 
(incomplete domain) and the RBDs from 1 HA1 (domains A and B), 
1 PYS, 1 RIS, 1 URN (molecule 2) superimposed as In Figure 7a. 
Alignments of the N-terminal domain of D. Tok Pol against DNA 
polymerase 6 and e were obtained using CLUSTALX [54], using its 
default parameters. The conserved primary sequence motifs RNP1 and 
RNP2 are boxed. The alignment is colored by sequence similarity (1 5%, 
white to 75%, green) calculated by averaging the similarity scores at 
each position of all possible pairs of sequences (DJ, unpublished 
software). Equivalence of nonidentical residues was established by use 
of the BLOSUM62 amino acid substitution matrix [55]. Secondary 
structural elements corresponding to the N-terminal domain of 
D. Tok Pol are represented (pink) with helices as cylinders, strands as 
arrows, and other as thin lines. Numbering of residues and naming of 
secondary structural elements is that of D. Tok Pol. 



structure of this domain using DALI [34] (80-90 residues) found in RNA-binding proteins of 
(http://www.embl-ebi.ac.uk/dali/) revealed a previously prokaryotes, archaea, and eukaryotes (reviewed in [15]). 
unsuspected similarity to RBDs. RBDs are small modules These modules adopt a conserved papjtep architecture 
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and bind to single-stranded RNA. Two conserved 
sequence motifs, referred to as RNP1 (ribonucleoprotein 
1) and RNP2, provide aromatic and charged residues that 
are important for RNA recognition [35] (Figure 7). 

The N-terminal domain of D. Tok Pol can be superim- 
posed closely onto the core secondary structural ele- 
ments of RBDs from the U1A spliceosomal protein [35], 
ribosomal protein S6 [36], the heterogeneous ribonucleo- 
protein (hnRNP) proteins (two RBD domains) [37,38] 
and the anticodon-binding domain from T. thermophilus 
phenylalanyl-tRNA synthetase [39]. The rmsds in Cot 
positions for these superpositions are in the range of 
0.5-2.0 A (Figure 7a). Differences between the struc- 
tures of the loops in the N-terminal domain of 
D. Tok Pol and those of the RNA-binding domains are 
within the range of structural variation seen in the 
various RNA-binding domains. 

There is no evidence at present to suggest that the N-ter- 
minal domain of D. Tok Pol binds RNA. However, com- 
parison with the structures of RNA complexes of 
RNA-binding domains shows that the N-terminal domain 
might in fact be a functional RNA-binding domain 
(Figure 7). In particular, three aromatic residues in the 
N-terminal domain (Tyr37, Tyr39 and Tyr86) could inter- 
act with RNA bases in a manner similar to that seen in 
crystal structures of RNA bound to RNA-binding domains 
[35] (Figure 7). Interestingly, these residues are located 
near the position of a guanosine triphosphate molecule 
that is found bound to the N-terminal domain of RB69 Pol 
[1] (Figure 7b). The DNA polymerases from bacterio- 
phage T4 and its distant relative bacteriophage RB69 bind 
specifically to the ribsome-binding site of their own 
mRNA (messenger RNA), repressing its translation 
[40-42]. The N-terminal domains of T4 Pol and RB69 Pol 
are smaller than that of D. Tok Pol. In the RB69 Pol struc- 
ture, the N-terminal domain seems to form an 'incom- 
plete' RNA-binding domain (Figure 7c). 

There is no significant overall sequence similarity 
between the N-terminal domain of D. Tok Pol and 
RNA-binding domains, which is why the presence of 
this fold was not recognized previously (Figure 7c). 
Comparison of the sequences of other archaebacterial 
DNA polymerases and human polymerases 8 and e sug- 
gests that a corresponding structural element is likely to 
be found in these polymerases as well (Figure 7c). The 
sequence alignment in this region is unambiguous for 
the archaebacterial DNA polymerases. For eukaryotic 
polymerases the alignment is less certain, but it seems to 
conserve the essential aromatic character of the RNP 
motifs (Figure 7c). Confirmation of the presence of 
these domains along with their ability to bind RNA, and 
their precise role in eukaryotic DNA synthesis awaits 
future structural and functional studies. 



Biological implications 

The structure of the DNA polymerase from the archae- 
bacterium Desuifurococcus strain Tok, D. Tok Pol, 
reveals a strong similarity to the DNA polymerase from 
bacteriophage RB69. It also reveals the presence of an 
N-terminal domain that has structural similarity to RNA- 
binding domains from the U1A spliceosomal protein, 
ribosomal protein S6, the hnRNP proteins and the anti- 
codon-binding domain from T. t thermophilus pheny- 
lalanyl-tRNA synthetase. Although the structure in the 
immediate vicinity of the central catalytic region of the 
polymerase domain closely resembles that of Pol I type 
DNA polymerases, the overall architecture of D. Tok Pol 
and the placement of the exonuclease domain is strikingly 
different. The similarity between D. Tok Pol and RB69 
Pol suggests that these two structures are representative 
of a common Pol II polymerase fold. Members of this 
family carry out chromosomal DNA replication in 
eukaryotes, including humans, and yet there is no struc- 
tural information available for any eukaryotic member of 
this family. While this manuscript was being prepared, the 
structure of another archaebacterial DNA polymerase, 
that from the organism Thermococcus gorgonarius has 
been reported [56]. The D. Tok Pol structure reported 
here, along with the RB69 Pol structure and the structure 
of the Thermococcus gorgonarius DNA polymerase, 
should now make it possible to generate reliable structural 
models for eukaryotic DNA polymerases. 

Materials and methods 

Protein expression and purification 

The D. Tok Pol bacterial expression vector and partial amino acid 
sequence were generous gifts of Life Technology Corporation. Conve- 
nient and reproducible protein expression was achieved by the cloning 
the D. Tok Pol gene into the Pet30 plasmid (Novagen). Determination 
of the amino acid sequence of the polymerase was completed using 
this construct. D. Tok Pol was purified by rysing biomass prepared from 
the above expression systems in a French pressure cell (Avestin). 
D. Tok Pol precipitated by incubation of the soluble fraction at 80*C for 
30 min was further purified by ion-exchange (High-Q, Bio-Rad) and gel- 
filtration (Superdex-200, Pharmacia) chromatography. Purified protein 
was concentrated to 1 5 mg/ml by ultrafiltration (Millipore) in 40 mM 
TRIS-HCI. (pH = 7.4). 50 mM (NH 4 ) 2 S0 4 for crystallization trials 

Crystaiiization, cryostabilization, and heavy-metal 
derivatization 

Crystals of D. Tok Pol (maximum dimensions: 200umx150umx 
100 urn) were prepared from "lOOmM TRIS-HCI (pH = 8.6), 10 mM 
MgS0 4 . 200 mM (NH^SO* 20% (v/v) 2,4 methyl pentane diol (MPD). 
1 1 % (w/v) PEG4K, 1 0 mM dithjotnreitol by vapor diffusion at 20*C. These 
crystals were cryostabilized in 100 mM TRIS-HCI (pH = 8.6), 10mM 
MgS0 4 , 200 mM L^SC^, 20% v/v MPD, 1 3% w/v PEG4K for 30 min and 
when shock-cooled in freshly thawed liquid propane (-180*C), diffracted 
synchrotron wiggler radiation (A1 beamline, Cornell High Energy Synchro- 
tron Source) to Bragg spacings of 2.4 A. D. Tok Pol crystallized in space 
group P2 1 2 1 2 1 with cell parameters (Native I: a = 64.8 A, b= 107.6 A. 
c = 1 53.2 A, a = 90", p = 90*, y = 90*). V M calculations suggest that there 
is one molecule per asymmetric unit with high solvent content Native data 
sets recorded under these conditions resulted in unacceptabty high non- 
isomorphism between frozen samples. Substitution of PEG400 for MPD in 
the crystallization and stabilization media resolved this problem and 
allowed structure determination by multiple isomorphous replacement 
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(MIR) (Native II: a = 66.1 A, b= 107.6 A, c = 155.9 A. a = 90°, 
p = 90°, y = 90*). Heavy-metal derivatives were obtained by soaking 
Native II crystals in stabilizing solution containing 10mM heavy-atom 
compound for 24 h. 

Data collection and phase determination 
X-ray diffraction data sets from a set of shock-cooled native and iso- 
morphous heavy-atom derivatives were recorded at the Cornell High 
Energy Synchrotron Source (CHESS) beamline A1 (X = 0.908). Data 
from Native I crystals (prepared with MPD) extended to a Bragg 
spacing of 2.4 A with an = 4.6%. MIR analysis was conducted on 
Native II crystals (prepared with PEG400), which yielded data to 
beyond 2.6 A. X-ray diffraction data were indexed, integrated and 
scaled using the HKL package [43]. 

The positions of heavy atoms were located manually by inspection of 
difference Patterson maps and checked by cross-phased difference 
Fourier maps. Experimental phases were calculated using these sites 
with the program SHARP [1 7J. In our hands, higher quality electron 
density maps were obtained by performing individual single isomor- 
phous replacement (SIR) calculations in SHARP and combining the 
individual SIR phases sets using the program SIGMAA [19,44]. The 
experimental phases were improved and extended by solvent flipping 
and negative-density truncation as implemented in SOLOMON. This 
procedure (SHARP/SOLOMON) yielded etectron-density maps of suf- 
ficient quality to allow the entire 0. Tok Pol polypeptide to be traced 
unambiguously. This map was dramatically improved over a map calcu- 
lated with MLPHARE/SOLOMON [1 9]. 

Model building and refinement 

The initial molecular model was built into a 3.0 A electron-density map 
using the interactive molecular graphics program O [45]. Model refine- 
ment was carried by conjugate gradient minimization, torsion-angle 
dynamics, and tightly constrained atomic temperature factor refinement 
in the program CNS [20]. Refinement against the 2.6 A Native II data 
set was interspersed with manual rebuilding of the model against 
c A -wetghted electron-density maps using (2|F Q HF C |) and (|F 0 |-|F C |) 
coefficients calculated by averaging structure factors of ten models 
resulting from multiple torsion angle dynamics runs [46]. The original 
electron-density map remained a useful guide throughout the rebuilding 
process. The progress of the refinement was monitored by reductions 
in R^ (10% of the recorded reflections) [47]. Against the Native II 
data set the model was refined to an - 29.5% and 
= 24.2%. The refinement was continued against the 2.4 A data Native 1 
data set A rigid-body search in CNS with the 2.6 A model yielded a 
clear solution that was refined as above. The final model for Native I 
was refined to an R^ = 29.9% and R^,,*™, = 25.3%, and the final 
model contains residues 1-756 with three disordered regions 
(386-389, 665-676. 757, 772). The Native II model contains 6030 
non-solvent protein atoms, 4 sulfate ions, 2 magnesium tons, and 116 
water molecules. The Native I model contains 5992 non-solvent protein 
atoms, 9 sulfate ions, 2 magnesium tons, and 106 water molecules. 
Model geometry was analyzed using the program PROCHECK [48]. 
Both models have no outliers In the Ramachandran plot, with over 80% 
of the residues in the most-favored region. 

Figure preparation 

Figures were composed in programs BOBSCRIPT v1.0 [49], GRASP 
vl.25 [50] and RIBBONS v3.00 [51], with renderings done in 
POVRAY v3.1e (http://www.povray.org). Figures 5 and 7c were com- 
posed using ALSCRIPT (52]. 

Accession numbers 

Coordinates have been deposited with the Research Collaboratory for 
Structural Btoinformatics under the accession code 1 QQC. 
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The 2.25 A resolution crystal structure of a pol a family (family B) DNA 
polymerase from the hyperthermophilic marine archaeon Thermococcus 
sp. 9°N-7 (9°N-7 pol) provides new insight into the mechanism of pol a 
family polymerases that include essentially all of the eukaryotic implica- 
tive and viral DNA polymerases. The structure is folded into Non- 
terminal, editing 3'-5' exonuclease, and polymerase domains that are 
topological^ similar to the two other known pol a family structures 
(bacteriophage RB69 and the recently determined Thermococcus gorgon- 
arius), but differ in their relative orientation and conformation. 

The 9°N-7 polymerase domain structure is reminiscent of the "closed" 
conformation characteristic of ternary complexes of the pol I polymerase 
family obtained in the presence of their dNTP and DNA substrates. In the 
apo-9°N-7 structure, this conformation appears to be stabilized by an ion 
pair. Thus far, the other apo-pol a structures that have been determined 
adopt open conformations. These results therefore suggest that the pol a 
polymerases undergo a series of conformational transitions during the cata- 
lytic cycle similar to those proposed for the pol I family. Furthermore, com- 
parison of the orientations of the fingers and exonuclease (sub)domains 
relative to the palm subdomain that contains the pol active site suggests 
that the exonuclease domain and the fingers subdomain of the polymerase 
can move as a unit and may do so as part of the catalytic cycle. This 
provides a possible structural explanation for the interdependence of 
polymerization and editing exonuclease activities unique to pol a family 
polymerases. 

We suggest that the NH 2 -terminal domain of 9°N-7 pol may be structu- 
rally related to an RNA-binding motif, which appears to be conserved 
among archaeal polymerases. The presence of such a putative RNA- 
binding domain suggests a mechanism for the observed autoregulation of 
bacteriophage T4 DNA polymerase synthesis by binding to its own 
mRNA. Furthermore, conservation of this domain could indicate that such 
regulation of pol expression may be a characteristic of archaea. Comparion 
of the 9°N-7 pol structure to its mesostable homolog from bacteriophage 
RB69 suggests that thermostability is achieved by shortening loops, 
foiming two disulfide bridges, and increasing electrostatic interactions at 
subdomain interfaces. 
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Introduction 

DNA polymerases catalyze the template-directed 
addition of nucleotides onto the 3'-OH group of the 
DNA primer tenrtinus. These enzymes replicate 
DNA with the required accuracy essential for geno- 
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mic stability, but generate sufficient mutations to 
stimulate and maintain evolution. Unlike 
Eucarya and Bacteria, relatively little is known 
about DNA replication in Archaea (Perler et al, 
1996), one of the three major evolutionary lineages 
of life (Woese et al, 1990). Archea play a significant 
role in the biosphere, accounting for up to 30% of 
the biomass in certain Antarctic waters (De Long 
et al, 1994), and exhibit much greater diversity than 
had originally been suspected (Barns et al, 1996). 
Many characterized archaeal species are adapted to 
live in environments of extreme temperature, press- 
ure, salinity, and/or pH such as hydrothermal 
vents, and hot springs (Rees & Adams, 1995). 

Although archaeal cells share many morphologi- 
cal features with Bacteria, archaeal proteins 
involved in gene expression including DNA replica- 
tion, transcription, and translation have been found 
to be similar to those from Eucarya (Edgell & 
Doolittle, 1997; Bult et al, 1996). In particular, most 
of the archaeal DNA polymerases that have been 
sequenced belong to the- a-like polymerase family 
(family B) that includes essentially all the eukaryotic 
replicative and viral DNA pols (Braithwaite & Ito, 
1993; Edgell a/., 1997). 

Crystal structures exist for DNA pols from each 
of four families: pol I (family A), pol a (family B), 
pol P (family X) and reverse transcriptase (reviewed 
by Joyce & Steitz, 1994; Doublie et al, 1999). 
Although pols from different families are structu- 
rally quite diverse, several common features have 
emerged. The pol domain from each resembles a 
right hand and may be further divided into palm, 
fingers, and thumb subdomains, as was originally 
described for the large fragment of Escherichia coli 
pol I (Klenow fragment) (Ollis et al, 1985). All poly- 
merases appear to share the same mechanism for 
nucleotidyl transfer involving two divalent metal 
ions (reviewed by Bautigam & Steitz, 1998). In 
addition, based on structures containing DNA and 
dNTP bound to pols from pol I, pol p, and reverse 
transcriptase families, a conformational change in 
the fingers subdomain from an open to a closed 
conformation is proposed to occur during the cata- 
lytic cycle (reviewed by Doublie et al, 1999). 

The pol a family polymerases are of medical 
importance as targets for development of antiviral 
and anticancer therapeutics. For example, human 
pol a is a target in the treatment of acute myelogen- 
ous leukemia and chronic lymphocytic leukemia 
(Keating et al, 1982; Robertson & Plunkett, 1993) 
and a variety of nucleotide analogs with antitumor 
activity inhibit strand elongation by pol a (Huang 
& Plunkett, 1995; Gandhi & Plunkett, 1995). Fur- 
thermore, polymerases, particularly those that are 
thermostable, have a number of critical biotechnolo- 
gical applications ranging from PCR to cloning and 
DNA sequencing. Despite their biological, medical 
and biotechnological importance, the pol a class of 
polymerases has not been structurally as well 
characterized as other DNA polymerase families. 

Here we report the 2.25 A resolution crystal 
structure of a pol a family DNA polymerase from 



the hyperthermophilic marine archaeon Thermo- 
coccus sp. 9°N-7 (9°N-7 pol). Thermoccocus sp. 9°N- 
7 was isolated from a hydrothermal vent at 9° N 
latitude off the East Pacific Rise (Southworth et al, 
1996). The structure is folded into NH 2 -terminal, 
editing 3'-5' exonuclease, and polymerase domains 
that are topologically similar to the two other 
known pol a family structures (bacteriophage 
RB69 (Wang et al, 1997) and the recently deter- 
mined Thermococcus gorgonarius (Tgo) (Hopfher 
et al, 1999), but differ in their relative orientation 
and conformation. 

The pol domain structure is reminiscent of the 
"closed" conformation characteristic of ternary 
complexes of the pol I polymerase family obtained 
in the presence of their dNTP and DNA substrates. 
In the apo-9°N-7 structure, this conformation 
appears to be stabilized by an ion pair. Thus far, 
the two other apo-pol a structures that have been 
determined adopt open conformations. These 
results therefore suggest that the pol a polymerases 
undergo a series of conformational transitions 
during the catalytic cycle similar to those proposed 
for the pol I family. Furthermore, comparison of 
the orientations of the fingers and exonuclease 
domains relative to the palm subdomain that 
contains the pol active site suggests that the 
exonuclease domain and the fingers subdomain of 
the polymerase can move as a unit, and may do so 
as part of the catalytic cycle. This provides a poss- 
ible structural explanation for the interdependence 
of polymerization and editing exonuclease 
activities unique to pol a family polymerases. 

We suggest that the NH 2 -terminal domain of 
9°N-7 pol is structurally homologous to the 
P<xPP<xP RNA-binding motif with an exposed patch 
of aromatic amino acid residues. Bacteriophage T4 
DNA pol, which is homologous to 9°N-7 pol, is 
known to bind its own mRNA and repress its own 
synthesis. The homology relationships to the RNA- 
binding motif suggest a structural basis for this 
regulatory mechanism. Furthermore, the conserva- 
tion of this domain in other archaeal pols suggests 
that such autogenous regulation of pol expression 
may be general for archaea. 

Results and Discussion 

Crystal structure of Thermococcus sp. 
9°N-7 pol 

The structure of the full-length, 775-residue 
enzyme (bearing the double mutation D141A and 
D143A) was deterrnined using the multiple isomor- 
phous replacement method to a resolution of 
2.25 A. The current model has an R-factor of 
23.9% ^^ = 30.8%) (Table 1). A Ramachandran 
plot of the model shows 86.8% of the residues in 
the most favored region and the remainder in 
additional allowed regions (12.4%) and generously 
allowed regions (0.8%). A total of 37 residues are 
not traced in the model and lie in regions of poorly 
defined electron density. The first of these gaps 
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occurs at the bottom of the palm domain (residues 
568-575), and the remainder are within the thumb 
region that is frequently observed to be partially 
disordered in apo polymerase structures, as is also 
the case here (e.g. Oliis et al, 1985; Kiefer et al, 
1997). Although no disulfide bridges were included 
in the refinement, four Cys residues showed anom- 
alous peaks in a difference Fourier map and side- 
chain distances and angles consistent with two 
disulfide bridges (Cys428:442, Cys506:509). 

The structure of 9°N-7 pol reveals features com- 
mon to all DNA pol structures as well as those that 
may be unique to archaeal pols. The overall shape 
of the enzyme can be described as a disc with a 
central hole that is folded into NH 2 -terminal, 3'-5' 
exonuclease, and polymerase domains (Figure 1(a) 
and (b)). Like all other pols of known structure, the 
pol domain resembles a right hand and may be 
further divided into palm, fingers, and thumb sub- 
domains, as was originally described for the large 
fragment of E. coli pol I (Klenow fragment) (Ollis 
et al, 1985). 9°N-7 pol is similar in structure to the 
pol a family polymerase from the mesos table bac- 
teriophage RB69 (RB69 pol) (Wang et al, 1997), 
although a number of these (sub)domains are 
shorter than in RB69 pol (Figure 1(c)). Nearly all 
these sequence length differences are attributable 
to loop segments that are fewer and shorter in the 
hyperthermostable 9°N-7. As was first observed in 
the RB69 pol structure (Wang et al, 1997), the 3'-5' 
exonuclease domain lies on the opposite side of the 
palm in comparison to pol I family polymerases. 
This domain arrangement is also seen in 9°N-7 pol 
and in Tgo pol (Hopfner et al, 1999), indicating that 
this result is likely to be general for the pol a 
family. The structural similarity between 9°N-7 
and RB69 pols is significant given the low sequence 
identity (<20%) in all but the active-site (palm) 
region, where sequence identity is 42% (Figure 2). 
Similar results hold for sequence alignments 
between 9°N-7 and human pol a. 

NH 2 -termlnal domain 

Many of the members of the pol a polymerase 
family, including archaeal pols, bacteriophage T4 
and RB69 DNA pols, have an NH 2 -terminal 
domain that is not observed in the pol I family. T4 
pol is known to control its synthesis in vivo by a 
mechanism of autogenous regulation (Tuerk et al, 
1990). The mRNA-binding activity has been 
located to within the first 100 residues of the pol 
(Wang et al, 1996), but the structure of a fragment 
comprising residues 1-388 of T4 pol failed to 
suggest a structural basis for RNA binding (Wang 
et al, 1996). Here, we note that certain structural 
similarities between the homologous region in the 
9°N-7 pol and the U1A RNA-binding protein may 
provide a rationale for RNA binding by T4 pol. 

The NH 2 -terminal domain of 9°N-7 pol can be 
considered as three modules based on compactness 
of folding (Figure 3(a)). The first module comprises 
residues 1-31, a three-stranded p-sheet that inter- 



acts extensively with the 3'-5' exonuclease domain 
via predominantly electrostatic interactions. Resi- 
dues 32-36 act as a flexible linker connecting the 
first module to the second (residues 37-123). The 
third module comprises residues 338-372. 

The second module is folded into a paPPocP 
motif, with two short p-strands, 5 and 6, inserted 
between the second and third elements. This motif 
occurs in a variety of proteins, and forms the basis 
for the most prevalent RNA binding motif, the 
RNA recognition motif (RRM). The RRM is present 
in the RNA-binding domains of hnRNP Al, spliceo- 
somal protein U1A and U2B", and the sex lethal 
protein (Burd & Dreyfuss, 1994). Although an align- 
ment of the NH2-terminal domains of archaeal pols 
(Figure 3(b)), together with T4 and RB69 pols, 
shows that they lack the RNP1 and RNP2 sequence 
motifs that characterize the RRM (Burd & Dreyfuss, 
1994), a number of highly conserved and invariant 
residues nevertheless emerges. Most of these resi- 
dues fall in a cluster on the surface of the NH2- 
terminal domains of 9°N-7 and RB69 pols which 
therefore could mark the location of an RNA bind- 
ing site atop the P-sheet platform on the face away 
from helix A (Figure 3(c)). 

Both a sequence alignment (Figure 3(b)) and a 
structural comparison (Figure 3(c)) reveal that T4 
and RB69 pols lack helix A and strand 7 of the 
PaPpaP motif, perhaps explaining why no sugges- 
tive structural homologies to RNA-binding folds 
could be identified (Wang et al, 1996, 1997). 

Experiments are needed to determine whether 
the NH 2 -terminal domain of 9°N-7 pol binds RNA. 
Although the PotPpap motif occurs in proteins that 
are not thought to interact with RNA (Burd & 
Dreyfuss, 1994), we find its presence in the NH2- 
terminal domain of 9°N-7 pol, in a region known to 
bind RNA in T4 pol (Wang et al, 1996), to be highly 
suggestive of this. RNA-binding capability could 
hold for other archaeal pols as well, since sequence 
aligment of NH 2 -terminal domain (Figure 3(b)) 
suggests that they share the pappaP motif. 

We further speculate that just as T4 pol binds its 
mRNA to down-regulate its own synthesis, such 
autogenous regulation of pol expression might 
occur in archaea. Autogenous gene regulation is 
well documented in bacteria, and has at least one 
precedent in archaea. It has been identified in 
the synthesis of the MvaLl ribosomal protein of 
Methanococcus vanielii (Hanner et al, 1994), and 
postulated for a ribosomal gene cluster from the 
halophile Halobacterium cutirubrum (Shimmin & 
Dennis, 1989). It is interesting that there is no struc- 
tural evidence that such regulation extends to 
eukaryotes, as human pol a shows no significant 
sequence homology to the NH 2 -terminal sequences 
aligned in Figure 3(b). 

3'-5' Exonuclease domain 

This domain is responsible for binding single- 
stranded DNA and excising mismatched bases in 
the elongated primer strand. The structure 
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Figure 1. Structure of the Thermococcus sp. 9°N-7 DNA polymerase. The NH 2 -terminal and 3'-5' exonuclease 
domains are colored yellow and green, respectively. The polymerase domain is divided into palm (brown), thumb 
(red), and ringers (blue) subdomains. Three highly conserved carboxylate groups (D404, D540, D542) mark the poly- 
merase active site, (a) Stereoview of the C a trace. Every 40th C" is numbered. Broken lines indicate disordered regions 
of the protein, (b) Ribbon diagram with secondary structure elements defined according to DSSP (Kabsch & Sander, 
1983). NH 2 -terrrunal domain: 1, 1-10; 2, 13-22; 3, 25-31; 4, 37-42; A, 48-51; 5, 55-58; 6, 61-64; 7, 67-75; 8, 78-86; B, 92- 
101; 9, 106-110; C, 116-123; J, 341-344; K, 349-363. 3'-5' exonuclease domain: 10, 137-144; 11, 157-163; 12, 168-172; 13, 
181-183; D, 187-201; 14, 205-208; E, 215-225; 15, 240-244; 16, 247-251; 17, 256r259; F, 260-266; G, 275-283; H, 292-300; 
I, 305-337. Polymerase domain: L, 374-379; 18, 397-404; M, 408-415; 19, 431-433; 20, 440-442; N, 44^468; O, 473-498; 
P, 507-532; 21, 535-539; 22, 543-547; Q, 553-567; 23, 578-590; 24, 593-598; 25, 603-606; R, 617-633; S, 636-651 (648-651 
disordered); T, 657-660 (disordered); 26, 662-665; U, 677-688; 27, 698-703; 28, 714-716; V, 731-734; W, 742-746. (c) Sche- 
matic comparing the (sub)domains of Thermococcus sp. 9°N-7 7and bacteriophage RB69 DNA polymerases. The 
domain boundaries for 9°N-7 pol were determined based upon a structure-based sequence alignment with RB69 pol 
(Figure 2) as defined for the RB69 pol (Wang et al, 1997). 
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Figure 2. A three-way partial sequence alignment of Thermococcus sp. 9°N-7 pol (9N-7), RB69 pol (RB69), and 
human pol a (HPOL). Dashes indicate gaps in the alignment, and segments not aligned are represented as amino 
acid residue spans within brackets. Ticks mark every 10 spaces. The 9°N-7 and RB69 pol alignment is based upon the 
crystal structures. The HPOL and RB69 alignment is from Wang et al (1997), except for a few short segments 
assigned based upon the three sequences shown here. Indicated below the sequences and boxed in yellow are consen- 
sus motifs in the exonuclease (Blanco et al, 1992) and polymerase (Wong et al, 1988) domains. The secondary struc- 
ture elements in 9°N-7 pol, as defined by DSSP, are given above the sequences. The structural elements are colored 
according to the scheme described in legend to Figure 1. Shown in purple in the 9°N-7 pol sequence are the archaeal 
polymerase motifs described by Edgell et al (1997). Residues within the polymerase domain that are invariant in the 
three sequences blue boxes; residues discussed in the section on dNTP binding, blue asterists. The two disulfide 
bridges in the palm (C428:C442, C506:C509) are shown schematically. 



reported here is that of a mutant of 9°N-7 pol lack- 
ing detectable exonuclease activity which was 
engineered to prevent degradation of DNA sub- 
strates during subsequent co-crystallization exper- 
iments. This 9°N-7exo~ pol was obtained by 
making two point mutations (D141A, E143A) in 
the Exo I (DxE) motif highly conserved among the 
3'-5' exonuclease domains of many DNA pols 
Perthshire et al, 1995; Blanco et al, 1992). In the 
Klenow fragment (KF) of £. coli DNA pol I, these 
residues (D355, E357) are responsible for binding 
the catalytic metals and for hydrogen-bonding 
with the 3'-OH of the terminal deoxynucleotide of 
the substrate DNA (Beese & Steitz, 1991). 

Aside from loop segments that are shorter than 
those observed in RB69 pol (see below), the top- 
ology of the exonuclease domain in 9°N-7 pol is 
very similar to that of RB69 pol. The domains 
superimpose in the central fksheet, containing the 
active site, with a root mean square deviation 
(rmsd) of 0.95 A (35 O atoms). The metal-binding 
residues not mutated in 9°N-7exo~ pol, D215 and 



D315, superimpose almost exactly on the corre- 
sponding RB69 pol residues (D222, D327). 

It is now possible to assign a structural context 
to the four archaeal sequence motifs identified by 
Edgell et al (1997). Three of the regions (A-C) lie 
within the exonuclease domain (Figure 2). Motif A 
forms part of the central P-sheet containing 
the active site; B, part of a solvent-exposed loop; 
and C, part of a five-stranded p-sheet nearly 
perpendicular to the central P-sheet. The fourth 
motif resides in the palm (see below). 

Pol domain 

This domain is responsible for the template- 
directed polymerization of dNTPs onto the grow- 
ing primer strand of duplex DNA. Like other poly- 
merases of known structure, the pol domain can be 
further divided into palm, fingers, and thumb sub- 
domains. While the structure of the thumb of 
9°N-7 and RB69 pols are highly similar, differences 
exist in the palm and fingers. Some of these differ- 
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• ences correspond to features that appear unique to 
archaeal pols, while others support a hypothesis 
that a conformational change occurs in the fingers 
as part of the catalytic cycle. 

Palm subdomain 

The palm, which contains the active site for 
polymerization, shows a high degree of structural 
similarity to the palm subdomain of other DNA 
polymerases. It is as structurally similar to pol I 
family polymerases as to those of the pol a family. 
Its rms deviation from RB69 pol around the active 
site (blue region in Figure 4(b)) is 0.84 A (26 C a 
atoms). Together with the Tgo pol structure 
(Hopfner et al, 1999), this structure confirms for 
archaea the conservation of a common catalytic 
core. A significant difference between the palm 
subdomains in 9°N-7 and RB69 pols are the two 
disulfide bridges present in 9°N-7 pol, one joining 
Cys428 and 442 and another joining Cys506 and 
509 (Figure 4(b)). Both the shortened loops and at 
least one disulfide bridge appear common to 
archaeal pols (see above). Indeed, the region con- 
taining one of the Cys residue in a disulfide bridge 
(C442) corresponds to the highly conserved archae- 
al motif D (Edgell et al, 1997; Figure 2). The Tgo 
pol structure shows the corresponding Cys resi- 
dues to be "poised" for disulfide formation, but 
still in reduced form. 

Until recently it was believed that all pols share 
a catalytic "triad" of carboxylate residues in the 
active site in the palm (Delarue et al, 1990). Wang 
et al. (1997) since recognized that only two of the 
carboxylate residues are invariant. The invariant 
carboxylates in 9°N-7 pol are D404 and D542. The 
third member of the triad, present as D540 in 
9°N-7 pol, is not essential: mutation at the corre- 
sponding residue (D1002N) in human pol a retains 
catalytic function (Copeland et al, 1993). D540 in 
9°N-7 pol may nevertheless be involved in binding 
the divalent metals required for catalysis. Mg 2 " 1 " is 
normally the optimal metal for human pol a 
activity. The pol a D1002N mutant shows greater 
catalytic efficiency and fidelity with Mn 2+ rather 
than Mg2 + (Copeland & Wang, 1993). 

D540 in 9°N-7 pol interacts with the hydroxyl 
group of Y538 that is within hydrogen-bonding dis- 
tance to D540. Substitution of this residue to Phe in 
human pol a (Y1000) causes only minor effects on 
catalysis but alters the pol metal affinity akin to the 
pol a D1002 mutation (Copeland & Wang, 1993). It 
seems likely that the hydroxyl moiety of Y538 in 
9°N-7 pol helps to lock D540 in position for Mg* + - 
specific binding. Consistent with this function is the 
strict conservation of Y538 among pol a family 
members (Braithwaite & Ito, 1993). 

Fingers subdomain 

The fingers subdomain of 9°N-7 differs in top- 
ology and relative conformation from RB69. The 
fingers of 9°N-7 pol are a simple helix-coil-helix, as 



in Tgo pol (Hopfner et al, 1999), whereas in the fin- 
gers of RB69 pol, the coil region is expanded with 
more secondary structure elements (Figures 2 and 
5). The shorter fingers of 9°N-7 pol are conserved 
among the archaeal pols aligned by Edgell et al 
(1997). It is possible that the fingers of archaeal 
pols define a minimal functional unit. 

Different positions of the fingers subdomain rela- 
tive to the palm are observed in the 9°N-7 and 
RB69 pol structures (Figure 5(a)). The fingers of 
Tgo pol (Hopfner et al, 1999) show a position inter- 
mediate between that in 9°N-7 and RB69 pols, 
when the palm subdomains of all three enzymes 
are aligned. It is interesting to note that the fingers 
subdomain of polymerases in the pol I family 
adopt different positions during the catalyic cycle 
(reviewed by Doublie et al, 1999). An open pos- 
ition corresponds to that seen in the apoenzyme 
form (Ollis et al, 1985; Kim et al, 1995; Korolev 
et al, 1995; Kiefer et al, 1997) and the form bound 
to duplex DNA (Eom et al, 1996; Kiefer et al, 
1998). A closed conformation has been observed in 
the ternary replication complexes of bacteriophage 
T7 pol Poublie et al, 1998), and Klentaq (Ii et al, 
1998) with bound DNA and dNTP. An analogous 
conformational change has been observed in tern- 
ary complexes of human immunodeficiency virus 
reverse transcriptase (Huang et al, 1998) and rat 
pol p (Pelletier et al, 1994). In the closed confor- 
mation the fingers rotate towards the palm to form 
a binding pocket for dNTPs. 

The differences in position of the fingers sub- 
domain in the three pol a family crystal structures 
suggest that the fingers of pol a family pols move 
during catalysis, analogous to that observed for the 
other polymerase families. It is interesting to note 
that if this is the case, there must be a correspond- 
ing movement in the position of the 3'-5' exo- 
nuclease domains not required in the other 
polymerase families as will be discussed below. If 
the position of the fingers in 9°N-7 pol more 
closely approximates a closed conformation, it is 
not clear why they would adopt a position pre- 
viously observed only in ternary complexes with 
bound dNTP and DNA. The fingers of 9°N-7 pol 
may be stabilized in this conformation because of a 
salt-bridge between E578 in the palm and K487 on 
helix O of the fingers. These residues are highly 
conserved among archaeal pols (Edgell et al, 1997) 
and both pol I and pols a families (Braithwaite & 
Ito, 1993). The corresponding salt-bridge does not 
form in polymerases of the pol I family because 
the fingers helix O lies too far from the palm. The 
fingers of Tgo pol, in fact, are rotated slightly away 
from the active site, relative to 9°N-7 pol, such that 
the E578:K487 salt-bridge cannot form. Another 
possible explanation for the difference in finger 
positions are the disulfide bridges present in 9°N-7 
pol but absent in the Tgo pol structure and in pol I 
family structures. At least one of the disulfides 
(Cys428:442) in 9°N-7 pol could be directly 
involved in orienting the fingers relative to the 
palm (Hopfner et al, 1999). 
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Figure 3 (legend opposite) 



Model for DNA and dNTP binding 

Based on the high degree of structural homology 
of the palm subdomains between 9°N-7 and pol I 



family pols, DNA and dNTP substrates from the 
bacteriophage T7 pol ternary complex (Doublie 
et al, 1998) were modeled into the 9°N-7 pol active 
site. The model shown in Figure 6 provides further 
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Figure 3. The RNA-binding motif in the NH 2 -terminal domain of 9°N-7 pol. (a) Topology diagram of the complete 
NH 2 -terminal domain (residues 1-129, 338-372). The RNA-binding motif PaPPotP known as the RNP recognition motif 
(Burd & Dreyfuss, 1994), is boxed, (b) Sequence alignment of the NH 2 -terminal domains of 9°N-7 pol, RB69 pol, T4 
pol, and archaeal pols. Alignment of 9°N-7 pol and T4/RB69 is based upon the crystal structures, and that of 9°N-7 
pol and the other archaeal polymerases is based upon sequence alignment of 13 sequences among those considered 
by Edgell et al (1997) (data not shown). The archaeal polymerase alignment was performed with the PILEUP algor- 
ithm in the GCG package (University of Wisconsin Genetic Computer Group). Secondary structure elements corre- 
sponding to 9°N-7 pol are given above the sequence. A consensus sequence was derived for the archaeal 
polymerases at those positions where at least 70 % of the 13 sequences shared the same residue. Boxed in yellow are 
those residues conserved between the archaeal consensus and both bacteriophage (T4, RB69) sequences. Position 367 
in 9°N-7 pol is starred (see the text for discussion). Abbreviations are as follows: PFU, Pyrococcus furiosus; SACD, 
Sulfblobus acidocaldarius; MJAN, Methanococcus jannaschii; POCC, Pyrodictium occultum Bl. (c) Ribbons representation 
of the NH 2 -tenrtinal domain of 9°N-7 (left) and RB69 (right) pols. Least-squares C a superposition was performed over 
the region of 9°N-7 pol including strand 4, part of strand 8, and helices B and C, and the domains were separated for 
side-by-side comparison. Shown in green is the pappap RNA-binding motif. Charged and aromatic archaeal consen- 
sus residues are shown with green side-chains, and yellow side-chains correspond to the residues boxed in yellow in 
(b). The loop between P strands 7 and 8 in 9°N-7 pol corresponds to the conformationally variable loop 3 in the 
canonical RNP motif (Shamoo et al, 1997). 



evidence that the position of the fingers in 9°N-7 
pol more closely approximates a closed confor- 
mation and their position in RB69 pol approxi- 
mates an open conformation. This model of a 
ternary complex for a pol a family polymerase 
places the dNTP within hydrogen-bonding dis- 
tance of residues on the fingers O helix that are 
highly conserved and known by mutagenesis to be 
functionally important. The corresponding residues 
on fingers helix P of the RB69 pol are farmer away 
and cannot directly interact with dNTP. 

The model places residues Y409 and Y494 near 
the deoxyribose moiety of the incoming dNTP. 
These residues appear to be functionally analogous 
to E480 and Y526 of T7 pol, which are responsible 
for discriminating between deoxy- and ribonucleo- 
tides (rNTPs). Y409 is invariant among the pol a 
family in the alignment by Braithwaite & Ito (1993) 
and nearly invariant (one exception) among 
archaeal pols aligned by Edgell et al (1997). 
Mutation of the corresponding residue (Y412) 
to Val in an exonuclease-deficient Thermococcus 



litoralis (Vent) pol causes a 200-fold loss of 
discrimination against rNTPs. The aromatic ring 
appears to be the functionally important moiety, 
as mutating Y412 to Phe conserves wild-type 
discrimination levels (Gardner & Jack, 1999). 

Y526 in T7 pol (F762 in Klenow fragment) has 
been dubbed the "ribose selectivity site" (Tabor & 
Richardson, 1995). A Phe residue at this position 
confers selectivity against incorporation of dideox- 
yribonucleotides (ddNTPs), whereas a Tyr residue 
in this position allows efficient incorporation of 
both nucleotide species. The presence of Tyr (Y494) 
in this position in 9°N-7 pol suggests the ability to 
incorporate dideoxynucleotides, as do Vent 
(Gardner & Jack, 1999) and human pol a (Cope- 
land et al, 1992). In fact, Tyr is invariant at this 
position among the archaeal pols aligned by Edgell 
et al (1997), and highly conserved in the pol a 
family aligned by Braithwaite & Ito (1993). 

The model of a ternary complex with dNTP and 
DNA places residues N491 and K487 in hydrogen- 
bonding distance from the triphosphate moiety of 
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Figure 4. Comparisons of 9°N-7 
and RB69 pols in different (sub)do~ 
mains to indicate loop segments 
that are shorter in 9°N-7 pol. Least- 
squares C* superposition was per- 
formed over the region in blue, and 
the domains were separated for 
side-by-side comparison. Loop 
regions are shown in magenta and 
their residue endpoints are labeled, 
(a) Comparison of the exonuclease 
domains. Indicated with purple 
asterisks are the active site carboxy- 
lates (mutated to Ala in the case of 
the 9°N-7exo~ pol used in this 
study), (b) Comparison of the palm 
domains. The three active-site car- 
boxylate groups are depicted with 
side-chains. 



the incoming dNTP. Both of these residues are 
invariant in the pol a family (Braithwaite & Ito, 
1993), and nearly invariant (one exception) among 
archaeal pols (Edgell et al, 1997). Mutation of the 
corresponding residues (N494, K490) in Vent (exo-) 
pol severely decreases enzyme activity (Gardner & 
Jack, 1999). 

Concerted domain movement 

The difference in position of the fingers sub- 
domain in 9°N-7 and RB69 pols is part of a larger 
conformational change involving the 3'-5' exo- 
nuclease and NH 2 -terminal domains. Comparing 
these two pol structures shows that in one of the 
pair, an essentially rigid-body rotation has 
occurred involving three of the five (sub)domains. 
This concerted movement affects both the position 
of the fingers relative to the pol active site (open 



versus closed conformation), as well as the position 
of the exonuclease active site relative to the pol 
active site. The 9°N-7 and RB69 pol structures may 
approximate different states along the reaction 
pathway corresponding to DNA synthesis and 3'-5' 
exonucleatic proofreading activities. 

When these two polymerases are aligned in the 
palm (the blue region in Figure 4(b)), the exo- 
nuclease and fingers are displaced between the 
proteins (Figure 5(a)). If the enzymes are aligned in 
the exonuclease domain (see Figure 4(a)), the 
fingers superimpose almost exactly (Figure 5(b)). 
Moving from a palm to an exonuclease-based 
alignment also brings the first module (residues 
1-31) of the NH 2 -terminal domains into identical 
positions (not shown). The joint motion of the first 
NH 2 -terminal module and the exonuclease may 
reflect the need to maintain ionic networks at the 
interface. There are two five-membered ionic net- 
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Figure 5. Least-squares C a super- 
positions of 9°N-7 and RB69 pols 
in the (a) palm subdomain or 
(b) exonudease domain. The 9°N-7 
pol backbone is shown in yellow, 
and its active-site carboxylate 
groups in gold. The RB69 pol back- 
bone is drawn in green, and its 
active-site residues in magenta. The 
central p-sheet of the exonuclease 
domain is light blue (9°N-7 pol) or 
dark blue (RB69 pol) to allow 
tracking of the domain motion. The 
precise regions used in the palm 
and exonuclease superpositions are 
shown in Figure 4. The NH 2 -term- 
inal domain has been omitted for 
clarity. Arrows in (a) indicate the 
direction of fingers and exonuclease 
movement when moving from (a) 
to (b). 




works formed between the first module and exo- 
nuclease (Figure 7). In addition, a three-membered 
network is formed between the third NH 2 -module 
(R346) and the exonuclease (Figure 7). This net- 
work is conserved among nearly all archaeal pols 
(Edgell et al, 1997), but none is present in RB69 
pol. 

Comparison of the Tgo pol structure (Hopfner 
et al, 1999) with that of 9°N-7 and RB69 pols using 
palm and exonuclease-based superpositions gives 
results similar to those in Figure 5, providing 
further support for the notion of a concerted 
domain movement. 

A model was constructed for the RB69 pol 
(Wang et al, 1997) showing how substrate DNA 
could shuttle between the pol and exonuclease 
active sites. When 9°N-7 and RB69 pols are aligned 
in the palm, the exonuclease active site in the 
former is tilted out and away from the pol active 
site, making it impossible for the DNA to shuttle. 
The exonuclease position in RB69, but not that in 
9°N-7 pol, is therefore consistent with an editing 
conformation. It is interesting that this confor- 



mation also means that the fingers are not in 
position to bind dNTP (see above). Taken together, 
these considerations suggest that during the 
replication cycle of family B pols, there is concerted 
movement of the exonuclease, NH 2 -terminal 
domain, and fingers relative to the catalytic region 
of the palm. 

This concerted movement may be the structural 
basis for the functional coupling of polymerase 
and exonuclease domains, which is unique to the 
pol a family. In this family it is possible to generate 
site-directed mutations in one domain that exert an 
indirect, negative effect on the other (Reha-Krantz 
& Nonay, 1993; Abdus Sattar et al, 1996). This con- 
trasts with pol I pols like KF, where these activities 
are completely confined to their respective 
domains (Ollis et al, 1985). 

Molecular basis of thermostability 

Thermococcus sp. 9°N-7 grows at temperatures of 
88-90 °C, and its pol has a temperature optimum of 
70-80 °C (Perler et al, 1996). It has a half-life of 6.7 
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Figure 6. The active site of 9°N-7 
pol and a modeled ternary com- 
plex, (a) Stereoview of the active 
site. Residues with indicated side- 
chains are discussed in the text. 
Hydrogen bonds are as broken 
lines, and the two disulfide bridges 
are shown in violet. K487 in mis 
structure is involved in a salt- 
bridge with E578 of the palm, (b) 
Model of a ternary complex of 
9°N-7 pol. For clarity only the 
incoming base and the first 
primer template base-pair are 
shown. Hydrogen bonds are shown 
as broken lines, and metal ions are 
modeled as green spheres. The 
9°N-7 pol and T7 pol ternary com- 
plex (Doublie et al., 1998) were 
superimposed in the palm (0.55 A 
rmsd for 13 O atoms). The rotamer 
conformation was adjusted for 
D542 and D404 in 9°N-7 pol, and 
the P turn including D542 was 
tilted downward, in a motion ana- 
logous to that observed between 
the apoenzyme and binary complex 
structures of Bacillus stearothermo- 
philus pol (Kiefer et al., 1997, 1998). 



hours at 95 °C (R.B. Kucera, unpublished results), 
whereas Thermus aquaticus (Taq) DNA pol has a 
half-life of 1.6 hours at 95 °C (Kong et al, 1993). The 
structure of 9°N-7 pol indicates a few key strategies 
for this hyperthermostability, some of which appear 
general to archaeal DNA pols. 

A surprising feature of the 9°N-7 pol is that it 
contains two disulfide bridges (Figures 1(a) and 
6(a)). The potential for the same bridges to form 
was also observed in Tgo pol (Hopfner et al., 1999). 
Although not normally the case in Bacteria or 
Eucarya, an increasing number of cytosolic pro- 
teins with disulfide bridges are being discovered in 
the Archaea (DeDecker et al, 1996; Singleton et al, 
1999). The stabilizing role of disulfide bridges 
has been well documented (Gokhale et al, 1994; 
Cooper et al, 1992). Introduction of disulfide 
bridges therefore appears to be a common strategy 
for archaeal protein stability. 

Alignment of a large number of archaeal pols 
(Edgell et al, 1997) suggests that having at least 
one of these disulfides is important for their 
thermostability. In fact, the two-stranded P-sheet 



containing C442 corresponds to sequence motif D 
in archaeal pols (Edgell et al, 1997). Based on 
whether Cys is present in the corresponding pos- 
itions, all the pols discussed by Edgell et al (1997) 
are predicted to have at least one of the two disul- 
fide bridges seen in 9°N-7 pol, with the exception 
of M. voltae and S. shibatae B3 pols. The mesostabil- 
ity of M. voltae pol may be partly caused by a lack 
of disulfide bridges. The S. shibatae B3 pol, like the 
S. solfataricus P2 B3 pol, is highly divergent in 
sequence from other archaeal pols, and it is unclear 
whether either of these functions in vivo (Edgell 
et al, 1997). 

An increased number of salt-bridges relative to 
mesostable homologs is often cited as a determi- 
nant of protein thermostability (DeDecker et al, 
1996; Korndorfer et al, 1995; Chan et al, 1995; 
Hennig et al, 1995). The 9°N-7 pol shows a sub- 
stantial increase in the fraction of charged residues 
participating in salt-bridges (47%) compared with 
RB69 pol (39%). These results are similar to a ther- 
mostability study of Pyrococcus furiosus glutamate 
dehydrogenase (Yip et al, 1995). The authors of 
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Figure 7. The extensive ionic 
networks at the interface of the 
NH 2 -terminal and 3'-5' exonudease 
domains. 



that study found a marked preference for Arg resi- 
dues in the ionic interactions of the thermostable 
enzyme, but no such preference is evident here. 
The same fraction (48 %) of Arg residues is used in 
ionic interactions in both 9°N-7 and RB69 pols, 
whereas a much higher proportion of Glu residues 
participate in salt-bridges in the 9°N-7 pol (53%) 
compared with RB69 pol (33%). 

The number and distribution of salt-bridges 
within domains does not substantially differ 
between 9°N-7 and RB69 pols. At the interfaces 
between (sub)domains, however, the differences in 
ionic networks are strilcing. The proportion of ionic 
interactions at interfaces in the 9°N-7 pol (21 %) is 
over twice that in RB69 pol (9 %). The differences lie 
at the interface of the exonuclease domain with the 
NH 2 -terminal domain (Figure 7), and at the inter- 
face of the exonuclease with the thumb, where a 
two and a three-member ionic network occur in 
9°N-7 pol compared with none in RB69 pol (not 
shown). 

Burial of the charged termini of proteins has 
been cited as another factor that can confer thermo- 
stability (Hennig et al, 1995). The NH 2 -terrninal 
methione (Ml) of 9°N-7 pol is stabilized by a 
hydrophobic cluster formed by L135, F327, 1256, 
V205, and L341 while the corresponding residue 
of RB69 pol is completely exposed to solvent. The 
B-factor for the C a of Ml in 9°N-7 pol is 26 A 2 , 



whereas for Ml in RB69 pol, it is 95 A 2 . While 
burial of the N terminus may be important for the 
thermostability of the 9°N-7 pol, the same does not 
hold for the C terminus. The last 25 residues are 
not visible in the electron density, similar to the 
case of RB69 pol. The solvent accessibility of the 
C terminus of these pols may reflect the need for 
this region to interact with a processivity accessory 
protein, which is known to be the case in the T4 
replication complex (Berdis et al, 1996). 

Another common strategy for protein thermo- 
stability is to lower the solvent-accessible surface 
area of the protein and to increase the proportion 
of buried structure (Korndorfer et al, 1995; Chan 
et al, 1995). This translates into a more compact 
structural design. There are at least 12 examples of 
loop segments in RB69 pol that are much shorter 
or absent in 9°N-7 pol. Some of the more striking 
examples are shown in Figure 4. Alignment of 16 
archaeal pols (Edgell et al., 1997) indicates that they 
share practically all of these sequence "deletions". 
The Tgo pol structure also revealed shortened loop 
segments relative to RB69 pol (Hopfher et al, 
1999). Nevertheless, the overall ratio of solvent- 
accessible surface area to volume for both 9°N-7 
and RB69 pols is the same (0.33). Thus, while low- 
ering the surface area to volume ratio is a common 
strategy for thermostability, it is not the primary 
basis for the stability of 9°N-7 pol. 
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Materials and Methods 

Purification, crystallization, and data collection 

Thermococcus sp. 9°N-7 polymerase (wild-type and the 
D141A,D143A exonudease-defident mutant) was over- 
expressed and purified as described (Southworth et al., 
1996). Crystallization, cryoprotection, data collection and 
reduction of native crystals are described (Zhou et al, 
1998). Derivatives were prepared by soaking native 
crystals in stabilization solution (Zhou et al, 1998) 
supplemented with 22.7 mM sodium ethylmercurit- 
hiosalicylate (thimerosal) for 11 days (thimerosal-1), 
3.0 mM K 2 PtCl4 for one hour (PtCl-1), 1.5 mM di-u-iodo- 
bis (emylenediamine)diplatinum (II) nitrate (PIP) for one 
day (PIP-1), or 1.0 mM Baker's mercurial for 50 hours 
(BAHg). These crystals were stepped through stabiliz- 
ation solution containing 8% (five minutes), 16% (five 
minutes), and 30% sucrose (one to five hours). 
Additional derivatives were collected with the improved 
cryoprotection procedure reported (Zhou et al, 1998) by 
soaking native crystals in 23.0 mM thimerosal for 83 
days (thimerosal-2), 3.0 mM KjPtCl* for seven days 
(PtCl-2), and 1.5 mM PIP for 35 hours (PIP-2). 

Structure determination 

The structure of the D141A,D143A mutant of 9°N-7 
polymerase was determined by the method of multiple 
isomorphous replacement (MIR). A number of native 
and derivative crystals were used to solve the structure 
because of problems with non-isomorphism (Table 1). 
Three native datasets were collected from single crystals. 
NAT-1 was mounted in the liquid nitrogen stream 
directly from cryoprotectant, whereas NAT-2 and -3 
were flash-frozen in liquid nitrogen prior to mounting. 
The crystals belong to space group P2 1 2 1 2 l with unit cell 
dimensions of approximately a = 96.1 A, b = 101.1 A, 
c = 112.2 A (for NAT-3). One molecule is present per 
asymmetric unit, giving a solvent content of approxi- 
mately 60%. 

A difference Patterson map of thimerosal-1 was calcu- 
lated using the program FFT in the CCP4 suite (CCP4, 
1994). One heavy-atom site for this derivative was ident- 
ified with the program RSPS (Knight, 1989). This site was 
used to calculate initial phases for NAT-1 at 5 A resol- 
ution using the program MLPHARE (Otwinowski, 1991). 
Difference Fourier synthesis with the initial phases 
revealed three sites for the PtCl-1 derivative. Two more 
sites for this derivative were discovered with the phases 
derived from both thimerosal-1 and PtCl-1. The correct 
handedness of the phasing information from these deriva- 
tives was determined using MLPHARE, and anomalous 
scattering data from the derivatives were included in the 
phase calculation. Three sites for the BAHg derivative 
and four sites for PIP-1 were obtained from difference 
Fourier maps calculated to 5 A resolution. All of these 
heavy-atom sites were included in subsequent phase cal- 
culations with NAT-1. The high-resolution phasing limit 
was extended to 3.5 A. Because of the high solvent con- 
tent in the crystals, use of the solvent-flattening program 
DM (Cowtan, 1994), in combination with histogram 
matching, improved the phases substantially. A polyala- 
nine model was built into the improved electron density 
map of NAT-1 with the program O (Jones & Kjeldgaard, 
1993) and refined in the program X-PLOR (Briinger, 
1992). Phase combination using the program SIGMAA 
(Read, 1986) further improved the map during building 
and refinement. 



Identification of side-chain densities was possible only 
after collecting a higher-resolution native dataset 
(NAT-2), along with diffraction data for three more 
derivatives obtained under improved cryoprotection 
conditions (thimerosal-2, one site; PtCl-2, four sites; 
PIP-2, five sites). These derivatives were used to calcu- 
late MIR phases of NAT-2 to 3.0 A resolution. Partial 
model phases of NAT-2 were calculated using the 
refined polyalanine model derived from NAT-1. Because 
of significant differences in unit cell dimensions between 
NAT-1 and 2, it was first necessary to subject NAT-2 to 
rigid-body refinement against NAT-1 in X-PLOR. 
Combination of the polyalanine model phases and MIR 
phases with SIGMAA improved the electron density 
map. Model building, refinement, and phase combi- 
nation were reiterated until a complete polyalanine 
model could be built. In the final stage of refinement, 
NAT-3 was used to extend the resolution limit to 2.1 A 
and water molecules were added. 



Coordinate files and illustrations 

The Thermococcus sp. 9°N-7 polymerase atomic coordi- 
nates and structure factors have been deposited in the 
RCSB Protein Data Bank under the accession code 
1QHT. The RB69 coordinates used for comparisons in 
this manuscript are those of the orthorhombic crystal 
form (accession code 1WAJ). Figures were prepared 
within the IRIS Showcase program (Silicon Graphics, 
Inc.) entirely (1(b), 2, 3(a) and 3(b)) or with images 
imported from MOLSCRIPT (1(a)) (Priestle, 1991) or 
SETOR (3(c), 4-7) (Evans, 1993). 
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Appendix I 

We have purified and characterized the Family BDNA polymerase from 
the archaeon Methanococcus maripaludis, cloned from ATCC 43000. This 
polymerase has a 41% sequence identity and 63% sequence similarity with 
Vent DNA Polymerase when analyzed using NCBI Blast 2 and the default 
parameters. 

We performed the titration assay described in Example 1 of the patent 
application, using the Mma, Vent (exo-), and 9°N (exo+) DNA Polymerases. 
Experimental details and data are given in the attached figure. 

For each of the three polymerases, a comparison of lanes using 
dideoxyCTP (ddCTP) with those using equivalent concentrations of acycloCTP 
(acyCTP) reveals shorter products in lanes utilizing acyCTP. These shorter 
products result from more efficient insertion of the acyCTP terminator 
compared to incorporation of the ddCTP terminator. Thus, all three 
polymerases incorporated acyCTP more efficiently than ddCTP. 

Figure Legend 

The ability of acyNTPs and ddNTPs to act as chain terminators was 
tested using a titration assay of the type described in Example 1. 
Incorporation of ddCTP was compared to that of acyCTP, respectively, using 
Methanococcus maripaludis DNA polymerase, 9°N (exo+) DNA polymerase 
and Vent® (exo-) DNA polymerases. 

Incorporation of ddCTP and acyCTP was assayed by mixing 8 ul of 
reaction cocktail (0.025 uM 5' [FAM] end-labeled #1224-primed M13mpl8, 
62.5 mM NaCI, 12.5 mM Tris-HCI (pH 7.9 at 25°C), 12.5 mM MgCI 2 , 1.25 mM 



- dithiothreitol, Methanococcus maripaludis DNA polymerase or 0.125 U/pl 
9°N (exo+) DNA polymerase or 0.125 U/pl Vent® (exo-) DNA polymerase) 
with 2 pi of 5X nucleotide analog/nucleotide solution to yield the final ratios 
of analog :dNTP indicated in the figures. After incubating at 72°C for 20 
minutes, the reactions were halted by the addition of 10 pi formamide. 
Samples were then heated at 72°C for 3 minutes and a 1 pi aliquot was 
loaded on a 4% polyacrylamide urea gel and detected by an ABI377 
automated DNA sequencer. 
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